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In  recent  years ,  psychometricians  and  cognitive 
psychologists  have  begun  to  conceptualize  research  ques- 
tions v;hich  span  both  fields.   Concern  about  the  nature 
of  the  trait  being  measured  and  its  relationship  to 
differential  cognitive  abilities  has  grown.   One  of  the 
questions  V7hich  has  arisen  from  this  area  of  research 
involves  the  degree  of  relationship  between  types  of  test 
item.s  and  the  mental  processes  which  are  required  to 
succeed  on  the  items.   Hovrever,  little  empirical  research 
has  been  done  relating  differential  cognitive  abilities 
to  success  on  essay  and  objective  tests.   This  specific 
problem  is  the  focus  of  this  study. 

In  this  study  it  was  anticipated  that  a  combination 
of  student  and  test  attributes  v/ould  help  to  explain  why 
many  students  score  differently  on  essay  and  objective 
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examinations  which  were  designed  to  measure  the  sane  course 
content.   Two  sets  of  variables  were  characterized  as 
student  attributes.   The  first  set  included  patterns  of 
fluid,  crystallized  and  creative  abilities  which  pre- 
dicted test  performance.   The  other  set  included  the 
previous  experience  of  the  students  with  the  subject 
matter  and  the  test  format. 

Student  attributes  alone  were  not  likely  to  explain 
the  variation  in  performance  on  essay  and  objective  tests. 
Attributes  of  the  tests  themselves  could  influence 
student  perf  oi^mance .   Therefore,  the  test  items  were 
classified  according  to  their  required  intellectual  pro- 
cess.  The  essay  item  was  designed  to  measure  the  ability 
of  the  students  to  synthesize  the  material  and  to  express 
any  creative  insights  into  the  conceptual  relationships. 
One  half  of  the  objective  test  items  required  the  abstract 
reasoning  ability  associated  with  fluid  intelligence. 
The  remaining  items  were  designed  to  measure  the  ability 
to  comprehend  and  analyze  the  subject  matter;  these  skills 
v/ere  more  closely  related  to  the  crystallized  abilities 
in  Cattell's  framework.   Items  associated  with  fluid 
ability  were  labeled  abstract,  and  items  which  required 
crystallized  abilities  were  labeled  concrete. 

This  study  was  designed  to  evaluate  the  argument 
that  a  broad,  unstructured  essay  measured  abstract  and 
creative  reasoning  abilities  while  objective  tests  tended 
to  measure  m.ore  crystallized  abilities.   The  counter  argu- 
ment that  differences  in  scores  on  essay  and  objective 
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tests  could  be  attributed  to  differences  in  the  abilities 
required  to  succeed  on  abstract  and  concrete  objective 
test  items  was  also  evaluated.   The  investigation  was 
conducted  in  four  stages.   A  preliminary  phase  of  the 
study  involved  a  description  of  the  sample  and  a  factor 
analytic  investigation  of  the  independence  of  a  creativity 
dimension.   Next,  the  relative  contribution  of  the  ability 
measures  to  the  prediction  of  performance  on  the  essay  and 
objective  tests  was  compared  using  multiple  regression 
techniques.   The  contribution  of  fluid  and  crystallized 
ability  to  the  prediction  of  concrete  and  abstract  items 
was  compared  using  multiple  regression  techniques  in  the 
third  stage  of  the  analysis.   The  final  phase  of  the  study 
was  an  investigation  of  the  premise  that  a  combination  of 
fluid,  crystallized  and  creative  abilities  and  scores  on 
concrete  and  abstract  items  could  be  used  to  classify 
students  as  better  on  essays  or  better  on  objective  tests. 
Discriminant  function  and  classification  analyses  were 
conducted  to  answer  this  question. 

A  separate  creativity  dimension  was  established. 
However,  there  were  no  significant  differences  in  the  regres- 
sion weights  of  the  equations  used  to  predict  success  on 
essay  and  objective  tests  or  the  equations  written  to 
predict  success  on  abstract  and  concrete  items.   The 
discrim.inant  function  which  separated  students  who  scored 
higher  on  the  essay  examination  from  those  who  scored 
higher  on  the  objective  examination  reached  statistical 
significance.   Students  who  scored  higher  on  the 
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objective  examination  tended  to  earn  higher  scores  on 
the  abstract  items  associated  with  fluid  intelligence, 
Students  who  scored  higher  on  the  essay  test  tended 
to  earn  higher  scores  on  the  concrete  objective  test 
items . 
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CHAPTER  I 
INTRODUCTION 

Traditionally,  one  measure  of  a  man  has  been  the 
elegance  of  his  prose.   Yet,  the  vagaries  inherent  in 
the  scoring  of  essays  were  recognized  as  early  as  1912 
in  studies  conducted  by  Starch  and  Elliot.   Essays  were 
not  found  to  be  valid  indicators  of  future  academic 
achievement  and  pressure  mounted  for  the  adoption  of 
new  types  of  examinations  which  were  objective  in  form 
and  for  vjhich  a  sophisticated  technical  theory  of  test 
construction  and  scoring  was  being  developed. 

The  use  of  objective  examinations  did  not  resolve 
the  conflict  about  the  validity  of  testing.   The 
validity  of  either  examination  form  could  not  be  estab- 
lished unless  the  reliability  of  both  essay  and  objective 
tests  was  improved.   Psychometricians  have  tended  to 
concentrate  upon  improving  the  technical  quality  of  the 
tests.   Through  research  they  sought  to  identify  and 
reduce  sources  of  measurement  error  to  improve  the 
reliability  of  test  scores.   The  validity  questions 
were  investigated  through  correlational  studies  of 
essay  and  objective  tests  and  other  criteria  such  as 
related  class  exercises.   Seldom  were  psychometricians 
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interested  in  the  intellectual  processes  which  the 
tests  were  purported  to  measure. 

The  intellectual  process  involved  in  learning 
became  the  concern  of  the  cognitive  psychologists. 
These  studies  represented  attempts  to  identify  and 
classify  the  mental  abilities  which  constitute  intel- 
ligence and  relate  to  specific  learning  tasks.   As 
Carroll  (19  74)  noted,  the  psychometricians  and 
cognitive  psychologists  pursued  two  practically  non- 
overlapping  areas  of  research,  both  of  which  were 
critically  involved  with  the  validity  of  testing. 

The  Problem 
In  recent  years,  psychometricians  and  cognitive 
psychologists  have  begun  to  conceptualize  research  ques- 
tions which  span  both  fields  (e.g..  Hunt  et  al . ,  19  75 j 
Horn,  19  76;  Cronbach,  197  5;  Stenhouse,  1976).   Concern 
about  the  nature  of  the  trait  being  measured  and  its 
relationship  to  differential  cognitive  abilities  has 
grown.   The  literature  on  aptitude-treatment  interactions 
(ATI)  is  replete  with  examples  of  these  studies.   One  of 
the  questions  which  has  arisen  from  this  area  of  research 
concerns  the  degree  of  relationship  between  types  of 
test  items  and  the  mental  processes  which  were  required 
to  succeed  on  the  items.   The  relationship  between 
different  mental  abilities  and  the  ability  to  solve 
various  types  of  analogy  items  was  investigated  by 
Whitely  (1975).   However,  relatively  little  empirical 
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research  has  been  done  which  related  differential  abilities 
to  success  on  essay  and  objective  tests.   This  specific 
problem  is  the  focus  of  this  study. 
The  Measurement  Approach  to  the  Problem 

Prior  to  an  examination  of  the  relationships  between 
cognitive  abilities  and  test  formats,  several  measurement 
problems  must  be  addressed.   Early  comparison  studies 
between  essay  and  objective  examinations  yielded  substan- 
tial correlations  between  scores  on  essay  and  objective 
tests,  but  the  uncorrelated  variance  may  have  been  due  to 
either  a  difference  in  the  function  of  the  tests  or  to 
measurement  error  (VJeidemann  S  Newens ,  1933;  Vernon, 
1959,  196  2;  Andrews,  196  8;  Godshalk,  Swineford,  S 
Coffman,  1956).   Attempts  to  verify  the  unique  contribu- 
tion of  essay  tests  by  Godshalk  et  al.  (1966),  Modu  . 
(1972),  and  Andrews  (1968)  had  contradictory  results. 

Vernon  (1961)  stated  that  it  was  logical  that 
scores  from  essay  and  objective  tests  over  the  same      ■  "' 
content  would  be  correlated.   Even  though  essay  questions 
may  have  been  designed  to  m.easure  higher  level  mental 
processes,  evidence  indicated  that  some  essays  contained 
substantial  factual  level  information  and  objective 
tests  may  have  directly  or  indirectly  measured  understanding 
and  thinking  processes.   However,  both  types  of  tests   < 
v;ere  imperfect  measures. 

Measurement  errors  in  objective  tests  could  be     .  •' 
traced  to  a  variety  of  sources:   ambiguities  due  to    ,    . 
directions,  choice  of  foils,  and  relevance  to  course 


material  and  assessment  of  complex  thinking.   Vernon 
cited  studies  in  which  objective  tests  developed  by 
different  instructors  covering  the  same  material  and 
administered  to  the  same  students  only  correlated  about 
0.50.   Corrections  foi--  differences  in  difficulty  of  the 
items  and  scoring  techniques  did  not  account  for  the  lovj 
magnitude  of  these  correlations.   Vernon  concluded  that 
there  may  be  a  trade  off  between  the  lower  reliability 
of  the  essay  and  the  validity  problems  of  the  objective 
tests.   Moreover,  "since  the  errors  which  reduce  the 
validities  are  different  it  follov7s  that  they  measure 
somev7hat  different  aspects  of  ability"  (Vernon,  1961, 
p.  228). 

Vernon  argued  that  if  the  value  of  the  essay  as  a 
measurement  device  was  that  it  tended  to  give  greater 
opportunity  for  the  measurement  of  the  more  complex  mental 
processes,  the  essay  item  should  be  broad  and  not  highly 
structured.   Essays  should  also  be  marked  for  imagination, 
fluency  and  unusual  understanding.   While  short,  tightly 
structured  essays  were  more  reliably  scored,  multiple 
essays  of  this  type  may  be  only  poor  substitutes  for 
v;ell  constructed  objective  test  items. 

The  results  of  attempts  to  identify  skills  unique 
to  essay  examinations  have  been  inconclusive.   Yet,  the 
evidence  that  some  students  consistently  differ  in  their 
ability  to  write  essay  and  objective  tests  has  grovm. 
French  reported  at  the  19  6  2  ETS  invitational  conference 
on  testing  that  SAT  scores  i>7ere  related  to  the  mechanics 
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of  writing  but  not  to  the  style  of  vjriting.   French 
asked:   "Isn't  this  what  English  teachers  have  been 
telling  us  psychometricians  all  along  that  essay  tests  of 
writing  ability  measure  something  that  objective  tests 
do  not  measure?   They  seem  to  be  right.  ..."  (French, 
1962,  p.  27).   Biggs  and  Braun  (1972)  reported  that  it 
appeared  likely  that  the  students  scoring  in  the  middle 
of  the  distribution  were  the  most  severely  affected. 
Students  at  either  end  of  the  distribution  tend  to  perform 
about  the  same  on  either  test  format.   The  problem,  as 
Coffman  (1971)  suggested,  is  to  find  valid  criteria  to 
document  and  explain  these  differences  in  ability. 
The  Cognitive  Approach  to  the  Problem 

The  ability  to  write  essays  may  be  more  than  a 
combination  of  knowledge  of  the  content  and  good  form;  it 
may  also  include  particular  cognitive  abilities.   A 
theoretical  framework  in  which  to  place  an  investigation 
of  this  premise  has  been  developed  by  Cattell.   The 
differentiation  of  mental  abilities  into  crystallized  and 
fluid  analytic  abilities  v/as  presented  in  Cattell  and 
Butcher  (1968).   These  forms  of  intelligence  were  second 
order  correlated  factors  of  Guilford's  prim.ary  abilities. 

Fluid  intelligence  represents  an  analytic  ability 
measured  by  culture  free,  non-verbal  tests  while  crystal- 
lized ability  is  related  to  those  skills  taught  in  a 
particular  culture.   Cattell  and  Butcher  stated  that, 
"The  extent  to  which  the  skills  will  be  correlated  will  be 
a  function  both  of  the  extent  to  which  they  are  dependent 
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upon  the  single  gf  ability,  and  of  the  extent  to  vzhich 
they  have  received  the  same  length  of  training  and  the 
same  opportunities  to  improve"  (p.  21). 

The  distinction  between  ther;e  abilities  is  clarified 
in  the  following  passage,  "Crystallized  ability  .  .  . 
loads  more  highly  those  cognitive  performances  in  which 
certain  initial  intelligent  judgments  have  become 
crystallized  as  habits.   That  is  to  say,  fluid  general 
ability,  v/hich  is  in  some  ways  the  more  fundam.ental  of 
the  two,  has  at  some  tim.e  been  applied  in  this  field, 
and  the  individual  by  memorizing  former  responses,  is 
enabled  to  make  further  nevj  judgments  ....   Fluid 
general  ability,  on  the  other  hand,  shovjs  more  in  tests 
requiring  adaptations  to  entirely  new  situations  where 
crystallized  skills  are  of  no  advantage  because  they  do 
not  apply  to  the  particular  data"  (p.  19). 

An  extension  of  the  fluid  and  crystallized  ability 
framevrork  which  included  a  separate  creativity  dim.ension 
was  advocated  by  Horn  (1976).   The  literature  surrounding 
the  controversy  over  the  existence  of  an  independent 
creativity  dimension  has  been  reviewed  in  Chapter  II. 
The  relevance  of  creativity  in  the  scoring  of  essays  is 
apparent.   However,  the  existence  of  an  independent 
creativity  factor  for  these  data  was  established  in  order 
to  include  creativity  in  the  conceptual  fram.ework  of 
this  study. 

Support  for  the  association  between  ability  differences 
measured  in  different  test  formats  can  be  found  in  an 
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article  by  Snow  (1976).   This  article  reviewed  research 

on  studies  of  aptitude  treatment  interactions  and  focused 

attention  upon  two  complex  hypotheses  which  Snow  believed 

deserved  the  most  study.   The  first  complex  was  identified 

as  the  A. A  A  complex  which  asserts  that  individual 
1  c  X     ^ 

differences  in  anxiety  (A  ) ,  achievement  via  independence 
(A.),  and  achievement  via  conformity  (A  ),  interact  with 
instructional  treatments  differing  in  their  degree  of 
teacher  structure  and  student  participation.   The  summary 
of  studies  investigating  this  complex  supports  the  general 
hypothesis  that  anxious  conforming  students  perform  better 
vjith  structured  learning  while  independent,  confident 
students  do  less  well  in  structured  learning  situations. 

In  the  study  conducted  by  Biggs  and  Braun  (1972) 
these  personality  constructs  were   related  to  differences 
in  achievement  on  essay  and  objective  test  formats. 
Scores  from  five  tests  were  factor  analyzed  and  essay  and 
objective  test  factors  emerged.   Again  the  best  students 
did  well  on  both  measures,  but  middle  range  students 
did  poorer  than  expected  on  the  objective  examinations. 
It  appeared  that  students  vjho  excelled  in  structured 
situations  did  better  on  objective  tests  and  more  inde- 
pendent students  performed  better  on  essay  tests. 

In  most  aptitude-treatment  interaction  studies,  the 
personality  constructs  are  related  to  general  ability 
measures.   However,  as  Snow  stated,  most  ATI  research 
is  based  upon  an  undifferentiated  general  intelligence. 
Since  general  intelligence  is  itself  a  complex  of  fluid- 
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analytic  intelligence  (G^)  and  crystallized-verbal    .- '' 

intelligence  (G  ),  a  better  understanding  of  the  dif- 

ferences  in  achievement  measured  by  essay  and  objective  j 

tests  may  be  obtained  by  focusing  upon  differentiated   . ,  j 

'■'        1 
ability  rather  than  upon  differences  in  personality       .  '       ' 

constructs . 

Purpose  of  the  Study  ■ 

Student  performance  on  examinations  over  the  sam.e   .-  -•  . 
subject  matter  has  often  varied  depending  upon  the  format 
of  the  test.   In  this  study  it  was  anticipated  that  a  j 

combination  of  student  and  test  attributes  would  help 
explain  why  many  students  score  differently  on  essay 
and  objective  exajninations  which  were  designed  to  measure 
understanding  of  the  same  course  content. 

The  purpose  of  this  study  was  to  examine  the  contri-  ^ 

bution  of  student  and  test  attributes  to  the  prediction 
of  performance  on  essay  and  objective  tests.   Tv70  sets 
of  variables  were  characterized  as  student  attributes.  * 

The  first  set  included  measures  of  fluid,  crystallized  ^'* 

and  creative  abilities  which  predict  test  perf orm.ance . 
The  other  set  concerned  the  previous  experience  of  the 
students  with  the  subject  matter  and  the  test  format. 

Student  attributes  alone  were  not  likely  to  explain 
the  variation  in  performance  on  essay  and  objective  tests. 
Attributes  of  the  tests  themselves  could  influence  student  ^  - .  '  ,'■' '^ 
performance.   Therefore,  the  test  items  were  classified 
according  to  their  required  intellectual  process.   The   '   .  :> 
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essay  item  was  designed  to  measure  the  ability  of  the 
students  to  synthesize  the  material  and  to  express  any 
creative  insights  into  the  conceptual  relationships 
relevant  to  the  question.   The  objective  test  was  equally 
divided  between  items  which  required  the  abstract 
reasoning  associated  with  fluid  intelligence  and  items 
designed  to  measure  the  more  crystallized  abilities  of 
comprehension  and  analysis.   Items  associated  with  fluid 
ability  v/ere  labeled  abstract ,  and  items  which  required 
crystallized  abilities  were  labeled  concrete. 

This  study  was  designed  to  evaluate  the  argument 
that  a  broad,  unstructured  essay  measures  abstract  and 
creative  reasoning  abilities,  and  differences  in  the 
scores  on  the  essay  and  objective  test  could  be  attrib- 
uted to  differences  in  the  abilities  required  to  succeed 
on  abstract  and  concrete  objective  test  items. 

The  following  questions  have  directed  this  inquiry. 

1.  Is  there  a  linear  relationship  between  the  essay 
and  objective  test  scores? 

2.  What  are  the  patterns  of  fluid,  crystallized 

and  creative  abilities  which  can  be  used  to  predict  scores 
on  the  essay  and  objective  tests? 

3.  Are  different  patterns  of  abilities  predictive 
of  success  on  essay  and  objective  tests? 

4.  Is  there  a  difference  in  the  patterns  of  abilities 
which  are  related  to  successful  performance  on  concrete  and 
abstract  test  items? 
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5.   Will  linear  combinations  of  fluid,  crystallized 
and  creative  abilities  and  scores  on  concrete  and  abstract 
items  predict  a  difference  in  scores  between  essay  and 
objective  tests? 

The  questions  directing  the  inquiry  have  been  re- 
worded to  test  the  following  statistical  hypotheses: 

1.   The  multiple  correlation  coefficients  describing 
the  regression  of  essay  and  objective  test  scores  on  fluid, 
crystallized  and  creative  abilities  are  equal  to  zero. 

a.  The  measure  of  fluid  ability  will  not  increase 
the  accuracy  of  prediction  beyond  the  variance  predicted 
by  crystallized  abilities  for  essay  and  objective  test 
scores. 

b.  Creativity  test  scores  will  not  increase  the 
accuracy  of  prediction  of  essay  and  objective  test 
scores  beyond  the  variance  predicted  by  measures  of 
fluid  and  crystallized  abilities. 

c.  The  previous  experience  with  test  format  and 
number  of  related  courses  will  not  significantly  increase 
the  variance  explained  in  objective  and  essay  test 
scores  by  fluid,  crystallized,  and  creative  abilities. 

d.  The  interaction  of  crystallized  abilities  and 
creativity  will  not  significantly  increase  the  variance 
explained  in  objective  and  essay  test  scores  by  fluid, 
crystallized  and  creative  abilities,  and  the  experience 
variables . 

e.  There  are  no  significant  differences  in  the 
regression  weights  for  measures  of  fluid,  crystallized 
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and  creative  abilities  in  the  prediction  of  success 
on  essay  and  objective  tests. 

2,  The  multiple  correlation  coefficients  representing 
the  regression  of  scores  of  concrete  and  abstract  objective 
test  iteiTtS  on  fluid  and  crystallized  abilities  are  zero. 

a.  The  measures  of  fluid  ability  will  not 
increase  the  accuracy  of  prediction  of  concrete  and 
abstract  objective  test  items  beyond  the  variance 
predicted  by  the  measures  of  crystallized  ability. 

b.  There  are  no  significant  differences  in 
the  regression  weights  for  measures  of  fluid  and 
crystallized  abilities  in  the  prediction  of  success 
on  concrete  and  abstract  items. 

3.  There  are  no  significant  differences  in  the 
pattern  of  abilities  for  students  who  score  high  on 
objective  tests,  high  on  essay  tests,  or  equally  on 
essay  and  objective  tests. 

Significance  of  the  Study 
The  dual  purpose  of  testing  as  a  teaching  device 
and  an  assessment  tool  has  long  been  recognized.   But, 
the  effect  of  testing  upon  learning  and  the  validity  of 
tests  as  measures  of  learning  are  continually  being 
debated.   Traditional  measurement  research  related  to 
this  debate  has  centered  upon  the  elimination  of  errors 
due  to  the  construction  and  scoring  of  tests.   In  spite 
of  the  technical  quality  of  current  tests,  educators 
have  become  increasingly  concerned  with  the  clarification 


12 

of  vjhat  is  being  tested.   The  extensive  publications 
on  criterion  referenced  testing  and  the  current  emphasis 
on  functional  skills  assessment  are  testimony  to  this 
interest. 

A  further  step  in  the  examination  of  the  validity 
of  a  test  is  to  uncover  the  relationship  betV7een  the  form 
of  the  measurement  and  the  trait  being  tested  (Campbell 
£  Fiske,  19  59;  Cronbach  8  Meehl,  19  55).   The  manner  in 
Vvhich  an  ability  is  assessed  may  influence  which  mental 
functions  are  used  by  students  in  the  testing  situation. 
Moreover,  systematic  differences  in  the  type  of  intelli- 
gence required  to  succeed  on  different  test  formats  may 
obscure  the  validity  of  the  measurements  of  students' 
abilities. 

This  study  represents  an  attempt  to  strengthen  the 
tie  between  the  academic  disciplines  of  measurement  and 
cognitive  psychology.   An  insight  into  the  form  of 
testing  and  the  ability  being  measured  may  provide  new 
directions  for  validity  studies. 

Organization  of  the  Study 
The  rationale  for  this  inquiry  is  explained  in 
Chapter  I.   The  literature  pertaining  to  the  theoretical 
fram.ework  of  the  study  and  relevant  empirical  studies 
are  discussed  in  Chapter  II.   Chapter  III  contains  a 
description  of  the  methods  used  to  empirically  investigate 
t}ie  relationship  between  scores  on  essay  and  objective 
tests  and  fluid  and  crystallized  ability.   Results  of 
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this  study  are  reported  in  Chapter  IV.   The  discussion 
of  the  results  v/ith  the  conclusions  and  implications  of 
the  findings  are  presented  in  Chapter  V.   The  final 
chapter  includes  a  summary  of  the  study. 


Cir  -^ 
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CHAPTER  II 
REVIEW  OF  THE  LITERATURE 

The  literature  review  has  been  organized  around  the 
following  topics:   comparative  validity  of  essay  and 
objective  tests,  cognitive  and  creative  components  of 
intelligence,  problems  in  the  scoring  of  essays,  and 
problems  in  estimating  the  reliability  of  essays.   These 
topics  are  interrelated.   An  assessment  of  the  validity 
of  essays  or  objective  tests  is  incomplete  unless  the 
mental  abilities  being  tested  are  understood.   However, 
in  order  to  document  the  validity  of  the  tests  vjhich 
measure  these  abilities,  a  minimum  level  of  reliability 
of  the  instrument  had  to  be  assured.   Methods  to  score 
essays  and  to  establish  the  reliabilities  of  those  scores 
have  been  developed  in  answer  to  this  need. 

From  this  review  of  the  literature  a  theoretical 
framework  was  constructed.   Patterns  of  fluid  and  crystal- 
lized intelligence  and  creativity  were  related  to  reliably 
scored  essay  and  objective  tests.   Factors  in  the 
construction  of  the  examinations  as  well  as  differences 
in  the  experience  of  the  students  with  the  course  content 
and  the  test  formats  have  been  included  in  the  design  of 
the  study. 

m 
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Comparative  Validity  Studies 

Comparisons  between  essay  and  objective  examinations 
have  been  designed  to  assess  whether  or  not  the  two 
formats  measured  the  same  type  of  achievement.   These 
studies  were  generally  conducted  by  an  analysis  of  the 
correlations  between  the  essay  scores  and  scores  on  an 
objective  test  covering  the  same  content.   The  reliabilities 
and  correlation  coefficients  were  compared.   The  results 
indicated  that  substantial  correlations  existed  between 
the  two  types  of  tests.   However,  as  noted  in  the  intro- 
duction, the  portion  of  the  variance  which  did  not  correlate 
may  have  been  due  to  a  difference  in  the  function  of  the 
tests  oi-^  to  measurement  error  (Weidemann  8  Newens,  19  33; 
Vernon,  1959,  1962;  Andrews,  1968;  Godshalk,  Svjineford, 
S  Coffman,  1966).   These  conclusions  were  reached  because 
the  reliabilities  of  the  objective  tests  were  higher  than 
the  essay  test  reliability.   However,  reliability  for 
factual  type  questions  was  higher  than  for  analysis 
level  items.   Therefore,  the  more  complex  the  objective 
examination,  the  less  reliable  was  the  score. 

Cieutat  (19  60)  did  correlate  factual  and  application 
objective  test  items  with  factual  and  application  short 
answer  exercises.   The  correlation  for  the  factual  tests 
was  .62  and  the  correlations  for  the  application  tests 
dropped  to  r  =  .47.   Studies  of  this  nature  support  the 
belief  that  factual  items  are  more  valid  than  applied 
items. 
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Noyes  (1963)  analyzed  the  essay  scores  of  646  eleventh 
and  tv/elfth  grade  students.   The  purpose  of  the  study  was 
to  predict  writing  ability  from  various  combinations  of 
short  objective  tests,  each  measuring  a  different  aspect 
of  writing  ability.   The  fact  that  the  essay  tests 
correlated  more  highly  with  each  other  than  with  the 
multiple  choice  test  indicated  that  the  essays  may  have 
been  measuring  a  different  type  of  achievement.   The 
study  V7as  criticized  because  an  analysis  of  the  inter-  . 
correlations  of  the  objective  tests  may  have  altered  the 
conclusions  (Andrews,  1968). 

Comparisons  of  the  relative  validity  of  essay  and 
objective  tests  were  enhanced  by  the  use  of  multiple 
regression  techniques.   College  Board  studies  by  Godshalk 
et  al.  (1966)  found  that  twenty  minute  essay  scores  did 
make  a  unique  contribution  to  the  predictive  validity 
of  a  one  hour  objective  English  composition  test  when  the 
criterion  vjas  the  score  on  a  two  and  one  half  hour  essay 
examination.   But,  in  a  later  study  by  Modu  (19  72)  it 
was  concluded  that  very  little  new  information  was  gained 
by  including  a  twenty  minute  essay  in  an  American  history 
objective  test.   Moreover,  the  essay  scores  may  have 
reflected  factual  knowledge  rather  than  the  more  complex 
processes  such  as  application  or  synthesis. 
Study  Methods 

A  completely  different  line  of  inquiry  into  the 
relationship  between  the  two  test  formats  dealt  with 
questions  about  the  way  in  which  students  learned  and 
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retained  information.   The  influence  of  test  format  on 
study  methods  for  examinations  has  been  the  focus  of 
several  studies.   Observational  and  questionnaire  data 
reported  in  articles  by  Terry  (19  33),  and  Meyer  (19  36) 
suggested  that  students  were  concerned  v/ith  detail  when 
preparing  for  objective  examinations.   The  Meyer  study 
concluded  that  achievement  on  tests  of  either  format 
vras  greater  when  students  expected  an  essay  examination. 
Similar  results  were  reported  by  Katona  (1940),  and 
Weidemann  and  Nev/ens  (19  33).   However,  Vallance  (1947) 
and  French  (1956)  questioned  whether  either  method  of 
testing  had  any  effect  on  the  retention  of  content  over 
time.   No  reliable  differences  in  achievement  were  found 
by  Kakstian  (19  71)  between  students  v;ho  expected  either 
an  objective  or  an  essay  test  and  were  given  the  opposite. 
The  results  of  these  studies  make  any  definitive  statement 
on  the  value  of  a  particular  format  for  promoting  good 
study  habits  questionable  at  best. 

In  another  study,  scores  for  both  essay  and  multiple 
choice  tests  were  increased  by  allowing  open  book  examina- 
tions (Feldhusen,  1961).   Results  from  questionnaires 
revealed  that  students  believed  that  open  book  examina- 
tions reduced  rote  memorization  and  promoted  learning 
during  testing  regardless  of  the  form  of  the  test. 
Learning  Styles 

Recent  articles  by  Biggs  (1970)  and  Biggs  and  Braun 
(1972)  have  reported  the  results  of  investigations  into 
the  relationship  between  test  format  preference  and 


^:- 


The  implication  of  these  findings  was  that  students  did 
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personality  characteristics.   Biggs  established  tliat  a  •':•-] 
stable  relationship  existed  between  personality  charac- 
teristics and  study  behavior.   These  different  charac-    •  >       i 
teristics  were  correlated  with  the  results  from  a  factor   *"  "'      ) 
analysis  of  the  scores  from  two  essays  and  three  objective 
tests.   The  essay  test  scores  had  loaded  on  one  factor,  ,: 

-•  ''  •■  .  •  ,:•  .>;;'■■ 

and  the  objective  test  scores  loaded  on  the  other  factor.        'f.'-^. 
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differ  in  their  ability  to  take  essay  and  objective  tests.       «-^ 
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Therefore,  the  grading  procedure  for  a  course  could  '  ^"'"^ 

discriminate  against  some  students  on  the  basis  of  their 

learning  style.   For  example,  if  an  instructor  weighted         •■  •''^- 

all  examinations  equally  and  gave  mixed  f orm.at  tests  ,     * 

the  course  grade  for  a  student  may  have  depended  upon       "  "  ':"',' 

the  particular  combination  of  test  formats. 

Biggs  and  Braun  termed  equal  weight  scoring  of  tests 
for  a  course  grade  as  the  union  model  of  scoring.   Scor'es 
based  upon  the  union  model  were  related  to  types  of  study 
behavior.   This  comparison  revealed  that  union  scoring 
favored  students  who  were  dependent  on  the  instructor, 
organized,  tended  toward  rote  learning  and  could  relate 
information.   The  alternative  scoring  model  was  the       •  ,. 
disjunction  model  in  which  individual  students  were  tested 
by  the  specific  strategy  which  worked  best  for  them.        /  '\  '"I'^i. 
Biggs  and  Braun  rescored  the  same  set  of  examinations  •-■ 

using  this  model  and  found  that  students  who  were  ■  /■> 

independent  did  better.   It  should  be  noted  that  the 
best  students  did  well  with  either  model.   Thus,  the 
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discrimination  in  scoring  affected  the  middle  range  of 
the  distribution  of  scores. 

Mental  abilities  and  test  format.   Several  attempts 
have  been  made  to  discover  the  nature  of  the  difference 
in  mental  function  measured  by  the  essay  and  objective 
test  format  (Huddeston,  1954;  Cast,  1940;  Vernon,  1962; 
Rothkopf  S  Thurner,  1970).   These  studies  correlated 
the  essay  and  objective  scores  with  a  criterion.   Lee 
and  Symonds  (19  33)  reported  that  objective  tests  corre- 
lated more  highly  with  intelligence  than  did  essay 
tests.*  The  higher  correlations  may,  in  fact,  have  been 
due  to  the  higher  reliability  of  the  multiple  choice 
tests.   Andrews  (1968)  stated  that  multiple  choice  and 
essay  tests  correlated  more  highly  with  each  other  than 
with  any  other  assignment  except  the  total  grade  assigned 
for  several  journal  reviews. 

The  studies  previously  cited  were  attempts  to  verify 
intra- individual  differences  in  a  student's  ability  to 
succeed  on  both  essay  and  objective  examinations.   There 
was  a  consensus  that  essay  and  objective  tests  may  tap 
somewhat  different  abilities  under  certain  conditions. 
The  length  of  the  essay  and  the  complexity  of  the 
required  thinking  process  were  important  considerations. 
Relatively  few  studies  have  attempted  to  describe  the 
cognitive  abilities  which  might  have  a  bearing  upon 
the  problem. 

Much  of  the  current  interest  in  differential  cognitive 
abilities  has  been  generated  by  Guilford's  (1967)  Structure 
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of  Intellect  model.   This  model  resulted  from  a  factor 
analytic  investigation  of  a  variety  of  thinking  abilities.         - 
Memory,  cognition,  convergent  and  divergent  production,     '       \ 
and  evaluation  factors  emerged  from  the  analyses  of  '": 

tests  of  primary  mental  abilities.   The  development  of 
these  factors  in  individuals  was  thought  to  depend  upon 
a  combination  of  innate  and  environmental  influences. 

The  cognition  factor  included  an  element  which         ''  ^  • 
could  aid  in  the  interpretation  of  the  differences  in  '^' 

success  on  various  test  formats.   Guilford  identified        "'    ,  ' 
a  type  of  cognition  in  which  the  implications  of  actions    .  . 
were  recognized.   This  ability  was  of  two  types,  concrete      ■   'V 
and  abstract  foresight.   Guilford  stated  that  foresight 
was  an  important  ability  for  the  political  strategist 
or  policy  maker.   Thus,  the  extent  to  which  the  objective       ;■  ■,  ,v 
test  items  measure  the  same  cognitive  abilities  as  well 
as  tlie  same  content  would  be  important  in  determining  /;- 

the  comparability  of  the  two  test  formats.  !■.  •-.■iJ; 

The  Production  factor  in  the  Structure  of  Intellect 

model  is  particularly  relevant  to  this  study.   The 

individual  tendency  toward  convergent  or  divergent 

thinking  may  have  a  bearing  on  success  on  different  test 

formats.   Guilford  (1956)  explained  as  follows: 

In  convergent  thinking,  there  is  usually  one  -' 

conclusion  or  answer  that  is  regarded  as  unique, 
and  thinking  is  channelled  or  controlled  in  the 
direction  of  that  answer.   In  tests  of  the  con- 
vergent thinking  factors ,  there  is  one  keyed 
answer  to  each  item.   Multiple  choice  tests  are 
well  adapted  to  the  measurement  of  these  abilities. 
In  divergent  thinking,  on  the  other  hand,  there  is 
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much  searching  or  going  off  in  various  directions. 
This  is  most  clearly  seen  v/hen  there  is  no  unique 
conclusion,  (p.  274) 

Studies  which  linked  patterns  of  cognitive  abilities 
and  success  on  different  item  formats  have  been  encouraged 
by  Carroll  (1974)  and  Messick  (1972).   Whitely  (1976) 
followed  the  suggestion  to  study  the  item  in  order  to 
further  knowledge  about  abilities  and  learning.   Vfliitely 
had  two  purposes  for  her  study  of  the  analogy  item.   One 
purpose  was  to  determine  if  the  relational  concepts  which 
are  the  basis  for  analogies  influence  the  specific  cogni- 
tive aptitudes  reflected  in  analogy  item  performance.   The 
second  purpose  was  to  discover  vjhether  or  not  individual 
differences  in  the  ability  to  solve  analogies  could  be 
attributed  to  individual  differences  in  processing 
relationships.   VThitely  found  that  success  on  particular 
types  of  analogies  v^7as  related  to  specific  cognitive 
abilities.   However,  kinds  of  relationships  tested  on 
different  forms  of  analogy  tests  were  typically  not 
controlled  in  the  item  selection  process. 

\^itely  used  the  French  (19  51)  kit  of  primary  mental 
abilities  in  her  analysis.   However,  Cronbach  (1975) 
advocated  the  use  of  broad  ability  theories  in  studies  of 
individual  differences  in  learning.   Broad  ability  con- 
structs are  of  tV70  basic  types  (Horn,  1976).   One  type  is 
commensurate  with  Cattell's  (19  71)  formulation  of  fluid 
and  crystallized  ability;  the  other  includes  the  hier- 
archical theories  such  as  Vernon's  (19  50)  verbal-numerical- 
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educational  factor  and  a  practical-mechanical-spatial- 
physical  factor.   Differences  in  the  development  of  each  ,',  ' 
theory  relate  in  part  to  the  extent  of  their  dependence      : 
upon  factor  analytic  techniques.   Fluid  and  crystallized 
abilities  stem  directly  from  second  order  factor  analyses 
of  Guilford's  primary  mental  abilities,  whereas  hier-  ^ 

archical  formulations  include  some  variables  in  broad 
ability  constructs  on  the  basis  of  theory  rather  than 
empirical  evidence. 

Criticisms  of  Cattell's  theory  by  Humphreys  (1967) 
were  methodological  in  nature.   Humphreys  did  not  dispute       "  .  -.I 
the  existence  of  the  broad  ability  constructs.   Rather, 
he  criticized  the  inclusion  of  some  near  random  variables 
in  the  correlation  matrix  of  primary  m.ental  abilities. 
He  also  questioned  the  decision  on  the  number  of  factors         \  '- 

•  ^  ■■ 

to  rotate.   In  a  reanalysis  of  the  second  order  factors,  '   'i 

Humphreys  concluded  that  intellectual  speed  and  personality  .  - 

factors  should  be  identified  along  with  fluid  and  crystal-  • 'oN 
lized  abilities.                                          .;.„.. 

There  are  other  researchers  who  argue  that  a  creativity 
dimension  is  independent  of  general  ability  factors  (e.g., 

Cropley,  1972;  Rossman  £  Horn,  1972;  Kogan ,  1971;  Murphy,    •  ;  .  ;^y 

1973;  Torrance,  1970;  Wallach  g  Kogan,  1965).   Creativity  '. 

has  eluded  precise  definition  which  may  be  one  reason  for  :: 

the  controversy  about  its  measurement.   Yet,  empirical     '  '  - 

indicators  of  creativity  labeled  as  verbal  productive  -^ 

thinking  consistently  recur  in  the  literature.   Measures     ;.",  ■.; 

of  verbal  productive  thinking  are  tests  of  originality,       "  ■.,, 
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fluency  and  flexibility  such  as  those  found  in  the  Torrance 
Tests  of  Creativity.   Similar  tests  V7hich  measure  these 
abilities  loaded  on  the  convergent-divergent  production 
factor  in  the  Structure  of  Intellect  model.   "    ■  '  -'  •'-■  "  ■ 

An  investigation  into  the  relationship  of  fluid  and 
crystallized  abilities  and  verbal  productive  thinking  vzas 
conducted  by  Vernon  (1972).   In  a  reviev;  of  the  study, 
Horn  (1976)  reported  that  verbal  productive  thinking, 
with  crystallized  intelligence  partialled  out,  was  not  a 
useful  predictor  of  grades ,  teacher  ratings  of  imagination 
or  originality,  or  peer  sociometric  evaluations.   Verbal 
productive  thinking  did  contribute  to  the  prediction  of 
scores  on  essays  and  stories  beyond  the  variance  due  to 
fluid  and  crystallized  ability.         •  •  '    : 

In  a  review  of  the  literature  on  mental  abilities, 
Horn  (19  76)  stated  that  v;hile  verbal  productive  thinking 
was  largely  independent  of  intelligence,  there  was  doubt 
about  the  extent  to  which  it  measured  real  life  creativity. 
He  suggested  that  studies  be  designed  to  show  the  difference 
in  the  pattern  of  predictions  for  verbal  productive 
thinking  and  intelligence.   He  conjectured  that  when 
achievement  of  literary  comprehension  or  critical  reading 
were  the  dependent  variables,  a  stepwise  multiple  regression 
procedure  would  select  crystallized  intelligence  first, 
then  fluid  intelligence  follov/ed  by  a  little  verbal 
productive  thinking.   While  it  is  not  hypothesized  that 
students  who  score  differently  on  essay  and  obiective 
tests  are  necessarily  "real  life  creatives,"  deviations 
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from  the  pattern  of  regression  weights  suggested  by  Horn 
could  indicate  that  the  convergent-divergent  thinking 
factor  operates  differently  on  students  who  score  differently 
on  essay  and  objective  tests. 

David  Stenhouse  (19  76)  advanced  the  argument  that 
creative  students  vrould  tend  to  perform  better  on  objec- 
tive tests  than  on  essay  tests.   There  were  two  criticisms 
of  objective  testing  which  Stenhouse  countered  to  build 
his  case.   First,  the  opinion  that  objective  tests  were 
merely  verbal  recall  exercises  was  discounted  by  using  the 
language  game  concept  (Wittgenstein,  1953).   A  language 
game  requires  a  student  to  understand  how  words  fit 
together,  how  they  relate  to  physical  objects,  and  how 
objects  can  be  used.   As  knowledge  increases  the  language 
becomes  m.ore  packed  v/ith  meaning.   Thus,  the  "enigma  of 
expert  inarticulateness"  can  occur  (Stenhouse,  p.  171). 
The  student  may  have  a  plethora  of  information  from  which 
it  is  difficult  to  choose  quickly.   Thus,  objective  test 
formats  can  test  the  ability  of  the  student  to  understand 
the  relevant  language  game  limited  only  by  the  ability 
of  the  examiner  to  formulate  effective  questions.   Stenhouse 
explained  the  fact  that  students  often  are  not  able  to 
explain  their  choices  in  objective  examinations  because 
these  students  have  an  efficient  subconscious  upon  which 
they  are  willing  to  rely.   This  explanation  is  consistent 
with  those  creativity  studies  in  v/hich  creative  students 
tend  to  rely  on  the  subconscious  and  to  be  high  risk  takers, 
and  self  confident  (Barron,  19  69;  Bloomberg,  19  73). 
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,.  ■■  ■    Stenhouse  suggested  that  creative  students  may  be 

-;^.  ■     characterized  in  multi-modal  test  situations  by  rapid 

completion  of  objective  item.s  and  short  essay  responses. 

I'.,       Stenhouse  argues; 

jj;-;  ■  *   Thus,  a  good  objective  test  by  the  fact  that  it 
actually  provides  the  answers  and  the  candidate 

:.  has  only  to  select  the  appropriate  one,  allows 

■  an  individual  whose  learning  has  not  been 

:..'  thorough  in  the  usual  quantitative  sense  but 

.;'  ■  whose  powers  of  discrimination  and  judgment  are 

,  high,  to  score  v^7ell  in  relation  to  his  essay 

■  .■^,  type  score,  (p.  17  7) 

C'  Scoring  the  Essay 

'   ,,•  - .        The  procedures  for  scoring  essay  examinations 

evolved  from,  suggestions  by  Sims  (19  31,  19  33)  and  Cochran 
and  VJeidemann  (1939).   According  to  their  procedure,  the 
papers  were  sorted  into  from  three  to  five  groups  after 
a  cursory  reading.   The  essays  V7ere  then  reread  and 
;';,     shifted  as  necessary  to  the  appropriate  group.   Each 

question  i-7as  read  separately  f ollov/ing  a  review  of  the 
■,;      text  and  lecture  materials  relevant  to  the  topic.   The 
scoring  process  included  a  listing  of  the  main  points 

' -■      for  an  ideal  answer.   These  points  were  weighted  in  order 

•.  ■.      of  importance.   The  developm.ent  of  a  key  against  which  to 
judge  essay  content  was  an  attempt  to  turn  readers  away 
from  the  distractions  caused  by  spelling,  grammatical  or 
other  errors.   This  systematic  approach  to  the  marking 
of  essays  vjas  intended  to  reduce  the  variation  in  ratings 
among  readers.   However,  studies  contrasting  various 
approaches  to  marking  compositions  found  that  no  matter 
vjhich  method  V7as  used,  the  marks  of  individual  examiners 

''■v'-'';  :   diverged  widely  (Cast,  1939;  Hartog  S  Rhodes,  1936). 
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Stalnaker  (19  38)  found  that  weighting  the  questions 
for  difficulty  did  not  increase  score  reliability  for 
essay  examinations.   Correlations  between  weighted  and 
unweighted  scores  on  the  College  Board  examinations  were 
nearly  perfect. 

Later  studies  improved  the  marking  procedures  by 
selecting  representative  essays  to  illustrate  selected 
points  on  the  rating  scale  (Gosling,  1966).   A  chief 
examiner  was  also  added  to  spot  check  the  rating  process 
to  insure  adherence  to  standards .   Correlations  between 
the  marks  of  the  chief  examiner  and  the  average  of  the 
marks  of  the  other  raters  were  extremely  high.   The 
scoring  situation  was  unique  in  that  there  were  10  raters 
who  had  considerable  experience  using  the  rating  instru- 
ment on  those  particular  topics. 

Attempts  to  minimize  sources  of  measurement  error 
due  to  rater  unreliability  led  to  studies  comparing 
global  or  "holistic"  rating  with  the  analytical  procedure. 
Global  scoring  yielded  an  assessment  of  the  essay  expressed 
as  a  single  score.   The  score  represented  an  integration 
of  all  of  the  criteria  used  to  judge  the  essay.   Analytical 
scoring  gave  a  composite  score  in  which  the  various  elements 
were  scored  separately  and  summed  for  a  total  evaluation. 
These  methods  were  related  to  the  different  perceptions 
of  the  nature  and  function  of  the  essay.   Sims  (19  31) 
wrote  about  the  functional  type  essay  which,  "...  reveals 
information  regarding  the  structure,  dynamics  and  functioning 
of  the  student's  mental  life  as  it  has  been  modified  by  a 
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particular  set  of  learning  experiences"  (p.  17),   Sims 
(1933)  also  defined  essay  examinations  as  projective 
techniques.   The  examinee  was  confronted  with  a  situation 
into  which  he  "projected"  his  personality  and  drew  upon 
his  experiences  and  values  in  formulating  his  response. 
The  latitude  allowed  in  the  response  was  related  to  the 
appropriate  scoring  technique. 

The  more  complex  the  process  of  responding  to  an 
item,  the  more  difficult  it  was  to  develop  a  suitable  key. 
The  danger  in  analytical  scoring  was  that  rating  would 
be  too  narrowly  focused.   An  essay  rated  poor  V7hen  the 
score  consisted  of  a  sum  of  the  component  elements  in  a 
scoring  key  could  be  ranked  very  high  under  global  scoring. 
Global  scoring  permitted  an  assessment  of  the  inter- 
relationships of  the  various  components  of  a  good  essay. 
In  practice  it  appeared  that  there  was  little  difference 
in  scoring  reliability  betvjeen  the  two  methods  (Coward, 
1950;  Coffman  S  Kurfman,  1968).   Coward  did  discover  that 
two  global  ratings  were  finished  in  the  time  that  one 
analytical  rating  was  made  with  equally  reliable  results. 
Therefore,  it  would  appear  that  for  most  purposes,  global 
scoring  is  preferable. 
Problems  in  Estim.ating  Reliability 

A  major  source  of  error  in  estimating  the  reliability 
of  essay  examinations  is  scoring  error.   Readers  differ 
in  the  standards  they  apply,  in  their  preference  for 
writing  styles,  and  in  their  allocations  of  grades  (French, 
1962;  Coffman,  19  71).   Some  raters  grade  more  severely 


.,    .   than  others,  or  they  may  tend  to  distribute  grades 
•/-J'  differently  across  the  scale.   The  lack  of  interrater 

;'■  ..       reliability  has  been  extensively  documented  (Ilartog  S 

«_  ".  ■ 

;:•'.■    Rhodes,  1936;  Finlayson,  1951;  Vernon  S  Millican,  195i|; 

..■■*• 

; . ;'      Pearson,  1955;  Noyes  ,  19  63;  Coffman  &  Kurf man ,  19  68). 

^.  V         Not  only  do  raters  differ  from  each  other,  the  same 

-•v^  rater  does  not  mark  the  same  paper  consistently  over 

x/  time  (Marshall,  1967).   One  of  the  higher  rates  of  agree- 

■  '  '  ment  on  ranking  the  same  essays  at  two  separate  times 
'  \'  was  an  eighty-nine  percent  agreement  in  a  study  by 

Phillips  (1975).   This  study  did  not  deal  v^ith  essays    ■  ' 
specifically,  but  with  the  marking  of  open  ended  exercises 
in  the  National  Assessment  of  Educational  Progress. 
-'"■•;  The  order  in  v;hich  essays  vjere  read  has  affected 

:''       the  consistency  of  scoring  also.   Stalnaker  (19  36) 
;"■/:■■    grouped  essays  by  quality  following  an  initial  screening 
'      of  the  papers.   Raters  were  assigned  groups  of  essays 

■  ^      arranged  according  to  a  pre-determined  pattern  of  poor  '  ■   , 

quality  essays  and  superior  ones.   The  grades  for  average 
..'■*' " 

papers  V7ere  depressed  V7hen  they  followed  good  papers. 

Conversely,  average  papers  received  high  marks  when  they  . 

'  x'5"      were  read  after  poor  papers.   A  follow  up  study  by  Hales 
and  Tckar  (19  7  5)  reached  similar  conclusions.   Therefore, 

'f ,      both  studies  recommended  that  the  marking  procedure  take 
into  account  the  possibility  that  the  order  in  which 
papers  are  turned  in  by  students  may  be  related  to  the  , 
quality  of  the  examinations.   For  example,  students  of  ""  " 
similar  ability  may  sit  together  or  may  tend  to  require 


!•••?. 
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a  similar  length  of  time  to  complete  the  essay.   Storey 
(1968)  developed  three  nearly  identical  paragraphs  v;hich 
were  judged  to  be  excellent,  fair  and  poor  by  two  panels 
of  raters.   The  paragraphs  were  submitted  to  2  61  teachers 
in  homogeneous  groups.   Scores  were  distributed  similarly 
for  each  group  of  essays  and  no  significant  differences 
were  found  in  the  means  or  standard  deviations .   The 
grades  awarded  reflected  teacher  set  rather  than  the 
value  inherent  in  the  paragraphs. 

Early  studies  of  the  effect  of  handwriting  on  the 
evaluation  of  essays  were  made  by  James  (19  27)  and 
Sheppard  (1929).   The  same  compositions  were  recopied, 
and  the  quality  of  the  penmanship  was  related  to  the  marks 
given.   Chase  (1968)  reexamined  the  question  in  a  study  of 
the  effect  of  handwriting  quality,  spelling  accuracy  and 
the  use  of  a  scoring  key.   Significant  differences  in 
marking  V7ere  related  to  handwriting  but  not  to  spelling 
or  the  use  of  the  scoring  key.   Handwriting  quality  did 
not  have  an  apparent  relation  to  the  grade  on  the  first 
of  the  two  essays,  but  substantial  differences  did  occur 
when  the  second  essay  was  marked.   It  was  suggested  by 
the  author  that  the  readers  may  have  lost  patience  after 
grading  the  first  essay  and  lowered  marks  on  the  second 
in  frustration.   Markham  (19  76)  used  a  classification 
analysis  and  found  that  various  teacher  characteristics 
(age,  experience,  levels  taught,  or  degree  held)  had  no 
significant  influence  on  the  score  given  to  essays  written 
by  elementary  children.   However,  an  analysis  of  variance 
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of  the  scores  indicated  that  scores  varied  significantly 
when  the  dependent  variable  was  a  ranking  of  the  essays 
on  the  basis  of  handwriting  quality.   Different  results 
v;ere  reported  by  Marshall  (1S72)  who  compared  four  levels 
of  composition  errors  and  four  writing  treatments  (typed, 
neat,  fair,  poor)  in  a  four  by  four  factorial  analysis 
of  variance  design.   In  this  study,  16  forms  of  an  essay 
examination  which  were  identical  in  content  but  different 
in  the  number  of  errors  and  neatness  were  graded  by  48  0 
classroom  teachers.   In  this  case,  no  significant 
differences  were  found  in  the  analysis. 

Another  effect  on  the  reliability  of  essays  was  the 
error  associated  with  the  choice  of  topic.   Ruch  (1929) 
compared  the  reliability  of  eighth  grade  examinations  in 
16  subjects  with  the  reliability  of  marking  the  examinations 
The  same  paper  read  twice  by  two  readers  had  an  average 
correlation  of  .62.   The  scores  for  two  papers  written 
by  the  same  student  and  read  by  the  same  person  were 
correlated  at  a  lower  level  (.43).   Consistent  results 
were  also  reported  by  Young  (1962),  Swineford  (1964), 
and  Gustav  (1968).   However,  Wiseman  and  Wrigley  (1958) 
reported  that  differences  between  the  average  scores 
for  children  selecting  different  essay  topics  could 
be  accounted  for  by  differences  in  the  ability  of  the 
children.   The  authors  concluded  that  the  use  of  essay 
examinations  where  children  were  allowed  to  choose 
a  topic  from  a  set  of  topics  was  not  likely  to  introduce 
any  error  in  the  marking.   The  choice  of  topic  served  as 
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a  mechanism  for  the  children  to  sort  themselves  out  and 
made  no  real  difference  in  the  final  distribution  of 
scores. 

The  degree  to  which  scores  on  essays  reflect  content 
knowledge  or  composition  skill  has  been  the  subject  of 
much  research.   Scannell  and  Marshall  (1966),  and 
Marshall  (19  67)  used  a  factorial  analysis  of  variance  to 
investigate  the  relationship  between  essay  scores  and 
three  levels  of  grammatical  errors  and  four  types  of 
composition  errors.   It  v/as  reported  that  spelling  and 
grammatical  errors  reduced  marks ,  but  punctuation  errors 
did  not.   Following  a  principal  components  analysis  of 
composition  errors,  Slotnick  (19  72)  compared  very  high 
and  very  low  papers.   These  papers  could  be  distinguished 
by  quality  of  thought,  spelling,  range  of  vocabulary, 
word  choice,  sentence  structure,  emphasis,  and  paragraph 
organization.   Diederich,  French  and  Carlton  (1961) 
identified  five  characteristics  of  essays  which  contributed 
to  the  variability  in  grades  assigned  to  300  essays  by 
a  cross  section  of  businessmen,  teachers  and  scientists. 
A  factor  analysis  of  essay  and  Scholastic  Aptitude  scores 
resulted  in  five  factors.   These  factors  were  ideas, 
form.,  flavor,  mechanics,  and  wording.   The  SAT  scores 
V7ere  similar  to  the  scores  assigned  by  those  readers 
weighting  mechanics  and  word  skills  heavily.   The  SAT 
scores  V7ere  unrelated  to  scores  by  readers  v/ho  graded  more 
heavily  on  ideas. 


;^ 
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Fosvedt  (196  5)  selected  fi-ve  criteria  for  evaluating 
English  compositions  from  lists  drawn  from  the  National 
Association  of  Teachers  of  English,  testing  services,  and 
journal  articles.   This  approach  to  the  identification 
of  criteria  differed  from  those  mentioned  above  in  ^'/hich 
criteria  v^ere  deduced  from  an  examination  of  the  relation- 
ship between  the  types  of  errors  made  and  the  scores. 
Fosvedt  validated  five  criteria:   coherence  and  logic, 
development  of  ideas,  diction,  emphasis,  organizing  through 
sentence  structure,  and  paragraphing.   The  criteria  were 
ranked  by  a  panel  of  ten  judges.   Twenty  themes  were 
evaluated  on  a  three  point  scale  on  each  of  the  five 
criteria.   An  analysis  of  variance  on  the  average  grades 
assigned  resulted  in  significant  differences  on  the  grades 
assigned  by  teachers  and  among  the  criteria.   The  con- 
clusion reached  by  the  author  was  that  even  though 
teachers  believed  that  criteria  for  judging  essays  V7ere 
important,  they  failed  to  apply  the  criteria  consistently. 

Summary 
The  results  of  attempts  to  measure  abilities  uniquely 
measured  by  essay  examinations  have  been  inconclusive. 
The  acknowledged  difficulty  in  reliably  scoring  essays 
has  tended  to  attenuate  correlation  coefficients  between 
the  scores  and  the  criteria.   Yet,  the  belief  that  essays 
measure  a  unique  combination  of  skills  persists.   The 
task  as  Coffman  (1971)  suggested,  is  to  find  valid 
criteria  to  document  these  skills. 
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The  process  of  scoring  essays  has  been  refined  by 
the  large  testing  companies.   An  adequate  number  of  raters 
along  v/ith  several  samples  of  V7riting  tests  contribute 
to  the  reliability  of  the  results  for  essay  examinations. 
The  mechanics  of  scoring  include  training  the  raters  in 
the  use  of  the  criteria  for  evaluating  the  essays,  sorting 
the  papers  into  categories  prior  to  a  final  reading,  and 
spot  checking  the  final  reading  to  insure  that  similar 
standards  are  being  applied.  ■     -^ 

The  content  versus  style  dilemma  has  not  been 
resolved,  although  it  appeared  that  spelling  errors  and 
poor  handwriting  often  had  a  negative  effect  on  ratings. 
Several  rt.ocarchers  have  investigated  the  components 
upon  which  content  and  style  ratings  were   based.   VJhile 
T-  •    the  criteria  that  raters  may  have  used  were  consistent  .' 
across  studies,  the  relationship  between  these  criteria 
and  scoring  was  not  clear. 

The  unique  contribution  of  essay  examinations  to 
the  measurement  of  achievement  has  not  been  clearly 
demonstrated.   VJhile  comparison  studies  did  indicate 
that  some  students  score  differently  on  the  two  types  of 
examinations,  it  was  not  clear  to  vzhat  ability  the 
difference  could  be  attributed. 

Differences  in  cognitive  abilities  m.ay  be  related  ■   . 
to  success  on  specific  types  of  psychometric  items. 
The  debate  is  between  those  researchers  who  associate 
fluid  ability  v/ith  creativity  and  success  on  essay 
exaniinations ,  and  those  v/ho  link  creativity,  objective 
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examinations  and  fluid  ability.   Crystallized  ability  is 
important  in  any  examination.   V/here  success  on  test 
formats  is  uneven,  differential  cognitive  abilities  m.ay 
provide  an  explanation.   However,  the  difference  in  the 
cognitive  requirem.ents  vjithin  an  objective  test  may  be 
as  great  as  the  difference  betv/een  essay  and  objective 
test  items. 


CHAPTER  III 
METHOD 

This  study  was  designed  to  assess  the  contribution  of 
fluid,  crystallized  and  creative  abilities  to  the  prediction 
of  success  on  essay  and  objective  examinations.   The  investi- 
gation was  conducted  in  four  stages.   A  preliminary  phase  of 
the  study  involved  a  description  of  the  sample  and  an  investi- 
gation of  the  independence  of  a  creativity  dimension.   Next, 
the  relative  importance  of  the  ability  measures  for  the 
prediction  of  performance  on  the  essay  and  objective  tests  was 
compared.   The  contribution  of  fluid  and  crystallized  ability 
to  the  prediction  of  concrete  and  abstract  items  was  compared 
in  the  third  stage  of  the  analysis.   It  was  anticipated  that 
differences  in  the  cognitive  levels  of  the  objective  test 
items  would  help  to  explain  the  differences  in  the  students' 
ability  to  succeed  on  essay  and  objective  tests.   The  final 
phase  of  the  study  was  an  investigation  of  the  premise  that 
a  combination  of  fluid,  crystallized  and  creative  abilities 
and  scores  on  concrete  and  abstract  items  could  be  used  to 
classify  students  as  better  on  essays  or  better  on  objective 
tests.   The  subjects  for  this  study  were  168  students  enrolled 
in  an  introductory  course  in  political  science.   A  description 
of  the  subjects,  instruments,  and  statistical  analyses  is 
presented  in  this  chapter. 

35 
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The  Sample 

Students  enrolled  in  an  introductory  political  science 
course  v?ere  the  subjects  for  this  study.   There  were  171 
students  enrolled  in  the  course,  and  3  students  chose  not 
to  participate  in  the  study.   The  remaining  students 
received  a  bonus  for  participation  to  be  used  in  the  event 
their  course  grade  v.'as  in  question.   List  wise  deletion 
of  cases  due  to  missing  data  was  used. 

The  course  emphasized  the  study  of  international 
relations,  and  the  lectures  and  examinations  stressed 
conceptual  understanding  more  than  factual  knowledge . 
Since  the  course  fulfilled  the  general  education  require- 
ment, students  vzere  generally  sophomores  drav/n  from  a 
variety  of  departments  within  the  university. 

The  Instruments 

Five  instrum.ents  were  administered:   a  questionnaire, 
the  Cattell  Culture  Fair  Intelligence  Test:   Scale  Three, 
the  McGraw-Kill  Basic  Skills  Reading  Test,  the  Torrance 
Tests  of  Creativity,  and  a  final  two  hour  examination.   A 
tv;o  hour  essay  and  objective  midterm  examination  was  also 
administered  which  gave  students  experience  vjith  the  format 
of  the  items  used  on  the  final  examination. 

The  questionnaire  included  eight  items  dravjn  from 
suggestions  by  Coffman  (1971).   The  information  obtained 
from  the  questionnaire  covered  the  past  experience  of  the 
students  vjith  other  political  science  courses,  with  the 
instructor,  and  with  essay  and  objective  test  formats. 
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In  addition,  students  were  asked  to  state  their  test     ' 
format  preference,  if  any. 

To  assess  fluid  intelligence,  the  Cattail  Culture 
Fair  Intelligence  Test  v/as  administered  during  the  second 
week  of  class.   Both  sections  of  the  test  were  given  to 
provide  greater  score  reliability.   The  test  included 
100  items  in  the  4  different  subtests:   series,  classi- 
fications ,  matrices ,  and  conditions .   The  manual  reported 
that  there  were  high  intercorrelations  among  the  subtests, 
and  the  test  represented  a  valid  general  ability  factor. 
Scale  Three  of  the  Cattell  test  was  specifically  designed 
to  discriminate  among  very  intelligent  high  school  and 
college  students.   •  ■.  ... 

Reliability  for  Scale  Three  was  reported  at  r    =  .91 
for  college  undergraduates  as  an  immediate  test-retest 
coefficient.   Test-retest  stability  coefficients  obtained 
by  increasing  the  interval  between  testings  were  reported 
around  . 80 . 

This  test  measures  the  "relation  education  capacity  . 
in  quite  different  fields  of  content,  that  is,  verbal, 
numerical,  spatial,  and  social  skills"  (Cattell,  p.  6). 
These  abilities  are  largely  culture  free  and  represent 
fluid  intelligence.   The  number  of  items  and  testing  time 
for  the  four  sections  of  the  Cattell  test  have  been 
reported  in  Table  1. 
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Table  1 

Items  and  Time  Allotted  to  Each  Test  on  the  Cattell 
Culture  Fair  Intelligence  Test,  Scale  Three 


Time 
Test  No.  of  Items      (in  minutes) 


Series  26  6 

Classifications  28  8 

Matrices  26  6 

Conditions  20  5 

The  McGraw-Hill  Basic  Skills  Reading  Test,  Form  A, 
V7as  used  as  a  measure  of  crystallized  ability.   This  test 
included  scores  for  reading  rate,  retention  of  information, 
skimjning  and  scanning,  and  paragraph  comprehension.   A 
total  score  V7as  obtained  by  summing  the  three  part  scores; 
reading  rate  scores  were  not  included.   Only  the  reading 
comprehension  and  the  retention  of  information  scores  were 
used  in  the  study.   The  test  was  intended  to  assess  specific 
skills  in  reading  which  are  relevant  to  academic  success 
in  college. ,, The  number  of  items  in  each  subtest,  the 
testing  time,  and  the  total  num.ber  of  items  have  been 
included  in  Table  2.   The  internal  consistency  of  the 
test  v/as  calculated  at  r    =  .89  using  the  Kuder  Richardson 
formula  twenty.   However,  the  speed  factor  in  the  skimming 
and  scanning  section  may  have  inflated  the  reliability 
coefficient . 


.^ 
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Table  2 

McGraw-Hill  Reading  Test 
Summary  of  Items  and  KR„„  Reliabilities 


No.  of  Time 

Items    ^^9n    ^^^   minutes) 


Part  I  -  Reading  Rate  and       20       .65        16 
Comprehension 

Part  II  -  Skimming  and  30       .88        10 

Scanning 

Part  III  -  Paragraph  30       .76        UO 

Comprehension 

TOTALS  80  66 


Measures  of  reading  skills  are  nearly  pure  measures  of 
crystallized  abilities  (Horn,  1968).   Thus,  the  Cattell 
test  and  the  McGraw-Hill  Reading  Test  represent  the  fluid 
and  crystallized  ability  variables.   The  Torrance  Tests 
of  Creativity  were  selected  as  measures  of  verbal  produc- 
tive thinking  or  creativity.   Torrance  (1974)  defined 
creativity  as : 

,  .  .  a  process  of  becoming  sensitive  to  problems, 
deficiencies,  gaps  in  knowledge,  missing  elements, 
disharmonies,  and  so  on;  identifying  the  diffi- 
culty; searching  for  solutions,  making  guesses  or 
formulating  hypotheses  about  the  deficiencies; 
testing  and  retesting  these  hypotheses  and 
possibly  modifying  and  retesting  them;  and  finally 
communicating  the  results,  (p.  8) 

To  assess  these  qualities,  Torrance  included  seven 

activities  v;hich  were  scored  for  fluency,  flexibility 

and  originality.   The  rationale  for  the  selection  of 

these  activities  was  explained  in  the  technical  manual 
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(Torrance,  1968).   The  test  required  45  minutes  to 
administer.   It  was  scored  according  to  the  directions 
given  in  the  scoring  manual. 

The  final  examination  for  the  course  was  a  two  part 
examination  requiring  two  hours  to  complete .   The  first 
part  of  the  test  was  a  50  item  multiple  choice  test 
developed  and  pretested  by  the  instructor  the  previous 
year.   The  test  was  relatively  difficult  and  was  designed  ■ 
to  assess  the  students'  abilities  to  comprehend  and 
analyze  the  course  content. 

Items  in  the  objective  portion  of  the  examination 
v;ere  categorized  in  two  ways .   First ,  the  instructor 
grouped  the  items  according  to  Bloom's  (1956)  taxonomy 
of  Educational  Objectives.   The  classification  of  the  items 
by  the  instructor  was  reviewed  by  the  investigator  and 
agreement  was  reached  on  the  appropriate  category  for 
each  item.   This  categorization  ensured  that  the  items 
would  not  be  primarily  factual .   The  number  of  items 
which  represented  each  category  was  included  in  Table  3. 

Table  3 

Number  of  Items  in  Categories  of  Bloom's 
Taxonomy  of  Educational  Objectives 


Category  No.  of  Items 

Knowledge  6 

Comprehension  3 

Analysis  7 

Synthesis  29 


'^^ 


The  second  dimension  along  which  items  were  grouped 
was  a  concrete-abstract  differentiation.   Concrete  items 
were  defined  as  those  items  which  related  directly  to 
information  given  in  lectures  or  the  text.   Tliese  items 
vjere  designed  to  assess  the  students'  ability  to  analyze 
material  or  ideas  specifically  emphasized  in  course 
materials.   It  was  anticipated  that  these  items  would 
favor  the  conscientious  student  with  strong  crystallized 
abilities.   A  second  type  of  item  was  written  in  which 
the  connection  with  specific  course  materials  was  less 
direct.   These  items  required  the  student  to  recognize 
the  relevant  concepts  and  make  generalizations  based 
upon  an  understanding  of  their  interrelationships.   These 
broader,  more  global  questions  were  expected  to  require 
higher  fluid  abilities.   The  instructor  classified  22 
item.s  as  concrete  and  2  3  items  as  abstract.   Five 
items  were  deleted  due  to  lov;  discriminations.   Exam- 
ples of  concrete  and  abstract  items  can  be  found  in 
Appendix  A. 

Scoring 
The  objective  test  v-zas  machine  scored  using  a  National 
Computer  System's  scanner.   The  items  were  analyzed  using 
Test  Grader  II,  a  computer  program  adapted  from  a  program 
written  at  the  University  of  Wisconsin.   Test  Grader  II 
provided  the  descriptive  test  statistics,  including  item 
difficulties,  discriminations  and  point  bi-serial  correla- 
tions vjith  total  test  score.   Internal  consistency  was 
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computed  usirxg  the  Kuder  Richardson  formula  twenty.   The 
results  of  the  item  analysis  have  been  included  in  Appendix 
B. 

The  essay  test  was  scored  by  the  instructor  on  two 
separate  occasions.   Global  scoring  was  used.   The  score 
scale  was  the  traditional  scale  from  A  for  an  excellent 
paper  to  E  for  a  failing  paper.   The  alphabetic  grades 
were  assigned  points  on  a  numeric  scale.   Each  alphabetic 
category  was  subdivided  into  four  groups ;  thus ,  the  total 
numeric  scale  ranged  from  one  to  twenty  points .   Scoring 
reliability  was  assessed  by  correlating  the  scores  on 
two  separate  readings  of  the  essay. 

The  global  scores  assigned  by  the  instructor  were 
based  upon  these  general  criteria: 

1.  Understanding  the  dilemma  posed  by  the  question 

2 .  Synthesis  of  diverse  material  and  relevance  of 
examples 

3.  Discussion  and  analysis  of  conceptual  issues 

4.  Originality  of  perspective 

Writing  style  was  a  factor  in  scoring  to  the  degree  that    '' 
good  style  enhanced  the  effectiveness  of  the  argument. 
However,  a  conscious  effort  was  made  to  discount  grammatical 
and  spelling  errors  as  well  as  poor  penmanship.   The  essays 
were  shuffled  and  the  names  obscured  to  reduce  scoring 
bias. 
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Analys3'.s  .; 

This  study  was  designed  to  compare  the  contribution 
of  fluid,  crystallized,  and  creative  abilities  to  the 
prediction  of  success  on  essay  and  objective  tests,  -.  ,: 

Reading,  abstr-act  reasoning,  and  a  creativity  test  were 
administered  along  with  a  questionnaire.   A  one  hour 
essay  test  and  a  fifty  item  objective  test  over  the  same 
course  content  were  also  administered.   Subjects  for  this 
study  were  16  8  students  enrolled  in  a  political  science 
course . 

The  analysis  of  the  data  was  completed  in  four  phases. 
The  pi^eliminary  phase  involved  a  description  of  the  relevant 
characteristics  of  the  population.   In  addition,  a  factor 
analysis  was  conducted  to  validate  the  existence  of  a 
separate  creativity  dimension.   The  contribution  of  the 
ability  measures  to  the  prediction  of  success  on  essay  and 
objective  tests  was  analyzed  in  three  ways.   Next,  the 
extent  to  which  the  ability  measures  would  predict  scores 
on  essay  and  objective  tests  was  established.   Separate 
multiple  regression  analyses  for  each  test  format  were 
compared.   The  third  phase  was  a  consideration  of  the  pos- 
sibility that  success  on  abstract  and  concrete  objective 
test  items  would  require  a  different  pattern  of  abilities. 
Multiple  regression  was  used  to  analyze  the  contribution  of 
the  ability  measures  to  the  prediction  of  success  on  the 
concrete  and  abstract  items.   The  respective  regression 
weights  were  compared  to  determine  their  similarity. 
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The  premise  investigated  in  the  final  phase  of  the 
study  was  that  differences  in  abilities  could  be  used 
.  to  predict  relative  standing  on  essay  and  objective  tests. 
A  discriminant  function  procedure  was  used  to  analyze 
the  classification  of  students  as  better  on  essays,  better 
on  objective  tests  or  the  same  on  both  test  formats. 
Preliminary  Analyses 

The  analyses  of  the  data  began  with  a  description  of 
the  background  characteristics  of  the  sample.   The  descrip- 
tion section  was  important  for  two  reasons :   the  sample 
was  not  randomly  drawn,  and  previous  educational  experiences 
of  the  students  were  expected  to  influence  the  results  of 
the  study. 

Another  preliminary  analysis  was  conducted  to  examine 
the  interrelationship  of  the  variables.   The  existence  of   - ■  •  ' 
a  creativity  dimension  separate  from  dimensions  of  fluid    •.•  ' 
and  crystallized  abilities  has  been  debated  in  the  literature. 
One  hypothesis  of  this  study  was  that  a  creativity  dim.ension 
would  operate  differently  on  essay  and  objective  tests. 
Therefore,  to  establish  that  a  separate  creativity  dimension 
existed  for  this  sample,  a  factor  analysis  of  the  scores 
from  the  reading,  Torrance,  Cattell  and  class  examinations  "  ' 
was  conducted.   The  computer  program  \<ias    from  the  Statistical 
Package  for  the  Social  Sciences  (SPSS).   Since  the  purpose 
of  the  analysis  was  to  examine  the  underlying  structure  of 
the  variables,  a  principal  axes  solution  was  selected. 
Multiple  correlation  coefficients  were  inserted  in  the 
diagonal  of  the  correlation  matrix ,  and  iterations  were 
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conducted  to  arrive  at  the  coironunality  estimates.   The 
four  principal  factors  with  eigen  values  greater  than  1.0 
vere  extracted.   The  four  orthogonal  factors  were  rotated 
according  to  the  varimax  criterion. 
Predicting  Essay  and  Objective  Test  Scores 

Multiple  regression  equations  were  developed  to  test 
the  hypothesis  that  measures  of  fluid,  crystallized  and 
creative  abilities  predict  success  on  essay  and  objective 
tests.   ITie  predictor  variables  included  in  this  analysis 
were  retention,  comprehension,  originality,  fluid  intel- 
ligence and  major  field  of  study,  and  experience  with 
essay  tests.   The  originality  score  was  the  only  score 
selected  from  the  Torrance  Test  for  two  reasons.   First 
of  all,  the  course  instructor  stated  that  originality  was 
an  important  consideration  in  his  scoring  of  the  essay 
examinations.   The  second  reason  originality  scores  were 
selected  came  from  the  criticisms  of  creativity  tests 
in  the  literature.   In  a  critique  of  the  Torrance  Tests 
of  Creativity,  Harvey,  Hoffmeister,  Coates,  and  VJhite 
(1970)  stated  that  the  originality  scores  were  consistent 
across  the  different  activities  included  in  the  test. 
The  fluency  and  flexibility  scores  varied  v>7ith  each 
activity.   Therefore,  the  use  of  the  originality  test  fit 
the  purposes  of  the  study  and  was  judged  to  be  a  reliable 
measure. 

Two  separate  regression  equations  were  written.   One 
equation  specified  the  final  essay  scores  as  the  dependent 
variable.   The  second  equation  specified  the  final  objective 


test  scores  as  the  dependent  variable.   The  regression 
procedure  from  SPSS  was  used  to  complete  the  analysis. 
The  order  of  entry  of  the  variables  was  predetermined  as 
follows:   crystallized  abilities,  fluid  ability,  originality, 
major,  experience  and  the  interactions.   Tests  of  signifi- 
cance for  the  increase  in  the  sums  of  squares  due  to  each 
step  in  the  regression  were  made.   The  regression  model  to 
be  tested  was  written  as  follows : 

Y  =  b^X;^  +  h^X^    +  bgXg  +  b^X^  +  bgXg  +  bgXg  + 

byX^Xi^  +  b8X2X4  +  E.  •. 

where : 

A 

Y  =  predicted  scores  on  the  essay  or  objective  tests 

X-j^  =  reading  retention 

X   =  reading  comprehension 

X^  =  fluid  intelligence 

X|^  =  originality 

X   =  major  field 

Xg  =  previous  experience  with  test  formats 

XX   =  interaction  of  retention  and  originality 

XX   =  interactions  of  comprehension  and  originality 

The  comparison  of  the  beta  weights  for  the  two  equations 
was  accomplished  by  the  Biomedical  Package,  program  OllV 
(Dixon,  1973).   The  statistical  procedure  has  been  described 
in  the  section  on  profile  analysis  in  Morrison  (1967).   The 
level  of  statistical  significance  was  set  with  alpha  at  .05. 
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.  ■■^.         The  hypothesis  to  be  tested  was  that  the  differences 

'  ^^     in  the  regression  weights  would  be  simultaneously  equal 
to  zero. 

:;■  .         «0  =  ^\i    -   B21]  =  [B-L2  -  B22]  =  [8^3  -  B23]  = 

?^  .,  ^^^14-  ^24^  =    °- 

•  ■  ■        A  cross  validation  sample  of  100  cases  was  randomly 
drawn  from  the  population.   Scores  based  upon  the  predic- 
tion weights  of  the  screening  sample  were  correlated  with 
'  the  observed  criterion  scores  of  the  calibration  sample. 
The  shrinkage  in  R^  was  estimated  and  the  samples  were 
V.  ,    recombined . 

:•  .    '■  ■'  ■  i  ■ 

Predxctmg  Scores  on  Concrete  and  Abstract  Items       .  . 

The  second  phase  of  the  study  included  an  examination 
of  the  objective  test  items  to  determine  whether  or  not 
students  differed  in  the  ability  to  succeed  on  the  concrete 
;•      and  abstract  items.   A  further  question  involved  the 
■  ,     possibility  that  success  on  a  particular  category  of     ■  • 

objective  test  item  would  be  related  to  high  scores  on 
.  .\     the  ability  measures.   Two  regression  equations  were 

written  with  the  scores  on  the  concrete  and  abstract  items 
as  the  dependent  variables  respectively.   Independent 
variables  were  the  two  reading  scores  and  the  fluid 
intelligence  score.   Regression  weights  were  again  compared 
using  the  C  matrix  of  the  Biomedical  Computer  program 
OllV.  - 
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PredictJ.ng  Differences  in  Success  Across  Test  Formats 

The  pattern  of  the  regression  V7eights  v/ould  not 
necessarily  be  the  same  for  students  who  score  the  same  on 
essay  and  objective  tests  and  students  who  score  differently 
on  the  two  test  formats.   To  pursue  this  possibility,  three 
groups  were  formed.   The  high  essay  group  included  students 
whose  scores  on  the  essay  test  were  at  least  one  standard 
z   score  above  their  scores  on  the  objective  test.   The 
high  objective  test  group  was  defined  as  those  students 
whose  objective  test  scores  were  at  least  one  standard 
£  score  higher  than  their  essay  test  scores.   The  third 
group  included  the  rem.aining  students  whose  scores  on  the 
two  tests  were  within  one  z   score.   A  discriminant 
function  analysis  using  the  computer  program  from  SPSS 
was  conducted.   The  four  ability  measures  of  fluid  intelli- 
gence, retention,  comprehension  and  creativity  were 
used  to  predict  group  membership  with  the  direct  solution 
option  of  the  SPSS  computer  program. 

Summary 
The. study  was  designed  to  assess  the  contribution  of 
fluid,  crystallized  and  creative  abilities  to  the  predic- 
tion of  success  on  essay  and  objective  tests.   To  aid  in 
the  interpretation  of  the  results,  a  questionnaire  was 
administered.   The  items  on  the  questionnaire  provided 
data  on  the  students'  prior  experience  with  the  essay  and 
objective  test  formats.   Other  information  such  as  student 
major  and  previous  courses  from  the  instructor  was  gathered. 
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The  data  were  analyzed  in  several  stages.   First,  the 
existence  of  creativity  as  a  separate  factor  from  fluid  and 
crystallized  intelligence  was  tested  using  a  principal 
axes  factor  analysis.   Multiple  regression  analyses  were 
conducted  to  compare  the  prediction  of  success  on  the 
essay  and  objective  tests  by  the  four  ability  measures       - 
and  the   m.easures  of  experience.   The  regression  weights 
were  compared  using  the  C  matrix  of  BMD  OllV. 

The  second  phase  included  an  examination  of  the 
possibility  that  concrete  and  abstract  objective  test 
items  would  require  different  patterns  of  abilities  in 
order  to  succeed  on  the  items.   Multiple  regression 
analysis  was  used  to  investigate  the  relationships  between 
the  item  categories  and  the  ability  patterns. 

A  discriminant  function  analysis  was  used  to  study 
the  contribution  of  the  ability  variables  to  the  predic- 
tion of  the  relative  position  of  students  on  the  essay 
and  objective  tests.   Students  were  categorized  into  three 
groups.   Those  students  whose  essay  test  scores  exceeded 
their  objective  test  scores  by  at  least  one  standard  z 
score  comprised  the  high  essay  group.   The  high  objective 
group  included  students  whose  objective  test  scores  exceeded 
their  essay  test  scores  by  at  least  one  standard  z    score. 
The  third  group  was  composed  of  the  remaining  students 
whose  scores  on  the  tv.'o  test  formats  did  not  deviate  by 
more  than  one  standard  z  score. 


CHAPTER  IV 
RESULTS 

This  study  was  an  investigation  of  the  relationship 
between  fluid,  crystallized  and  creative  abilities  and 
success  on  essay  and  objective  examinations.   The  hypotheses 
generated  to  compare  the  patterns  of  abilities  which  are 
related  to  success  on  the  tv;o  test  formats  were  stated 
below.   Specific  hypotheses  rel.ated  to  each  general 
hypothesis  have  been  listed. 

1.  The  multiple  correlation  coefficients  representing 
the  regression  of  essay  and  objective  test  scores  on  fluid, 
crystallized  and  creative  abilities  are  equal  to  zero. 

a.  The  measure  of  fluid  ability  will  not  increase 
the  accuracy  of  prediction  beyond  the  variance  pre- 
dicted by  crystallized  abilities  for  essay  and 
objective  test  scores. 

b.  Creativity  test  scores  will  not  increase 
the  accurc-.jy  of  prediction  of  essay  and  objective 
test  scores  beyond  the  variance  predicted  by  measures 
of  fluid  and  crystallized  abilities. 

c.  The  previous  experience  with  test  format  and 
number  of  related  courses  will  not  significantly  increase 
the  variance  explained  in  objective  and  essay  test  scores 
by  fluid,  crystallized,  and  creative  abilities. 
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d.  The  interaction  of  crystallized  abilities 
and  creativity  will  not  significantly  increase  the 
variance  explained  in  objective  and  essay  test  scores 
by  fluid,  crystallized  and  creative  abilities,  and 
the  experience  variables. 

e.  There  are  no  significant  differences  in  the 
regression  weights  for  measures  of  fluid,  crystallized 
and  creative  abilities  in  the  prediction  of  success 

on  essay  and  objective  tests. 

2.  The  multiple  correlation  coefficients  representing 
the  regression  of  scores  of  concrete  and  abstract  objective 
test  items  on  fluid  and  crystallized  abilities  are  zero. 

a.  The  measures  of  fluid  ability  will  not 
increase  the  accuracy  of  prediction  of  concrete  and 
abstract  objective  test  items  beyond  the  variance 
predicted  by  the  measures  of  crystallized  ability. 

b.  There  are  no  significant  differences  in 
the  regression  weights  for  measures  of  fluid  and 
crystallized  abilities  in  the  prediction  of  success 
on  concrete  and  abstract  items . 

3.  There  are  no  significant  differences  in  the 
pattern  of  abilities  for  students  who  score  high  on 
objective  tests,  high  on  essay  tests,  or  equally  on 
essay  and  objective  tests. 

Results  from  Analysis  of  Questionnaire  Data 
The  questionnaire  served  a  dual  purpose.   It  pro- 
vided information  v/hich  the  instructor  routinely  gathered 
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about  the  background  of  the  students.   Items  which  were 
specifically  relevant  to  this  study  were  imbedded  in  the 
questionnaire.   The  items  relating  to  the  general  academic 
background  of  the  students  were  reported  in  Table  4. 

Table  H  ■         ;   . 

General  Academic  Background 


N 

•'■   i 

Age          ,  . ,   . . 

f  .  \H  .--  ..■^,  •    -.^v   .  :, 

20-under    •  ,   - 

114 

\:  .iv'^'"  - V'^.'  67 

21-25       .    .':^    :: 

45 

26-over 

11 

7 

Prior  course  from  instructor? 

Yes 

14 

.8 

No              ^--   .  , 

156 

"^  ,.  '  f. .%  -■-■\ :    92 

Number  of  previous  political 

science 

courses? 

None 

59 

35 

One 

39 

'  ■              23 

Two 

27 

16 

Thr'ee  or  more 

45 

26 

Are  you  a  transfer  student? 

No 

90 

53 

From  2-year  school 

60 

3S. 

From  't-year  school 

20 

1^. 

Two  of  the  facts  from  this  portion  of  the  questionnaire 
were  unexpected.   The  number  of  transfer  students  in  the 
course  was  nearly  50  percent  of  the  total  number  of  students 
enrolled  in  the  course.   The  fact  that  one  fourth  of  the 
class  had  taken  at  least  three  courses  in  the  field  was 
unexpected,  because  the  course  was  an  introductory  course. 
The  responses  to  the  other  items  fit  the  general  pattern 
of  enrollment  at  the  university. 
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Students  were  asked  to  state  which  test  format  they 
preferred  and  which  format  was  more  difficult.   An  item 
was  also  included  which  asked  students  how  frequently  they 
tended  to  write  essay  and  objective  examinations.   The 
results  for  these  questions  were  included  in  Table  5  . 

Table  5 
Summary  of  Items  about  Test  Format 


N  % 


Which  test  format  do  you  prefer? 

Essay                         60  35 

Objective                     6U  38 

No  preference                 US  26 

Other                          1  2. 

Have  you  written  essay  tests  in  other  courses? 

Usually                       53  31 

About  half  of  the  time         82  US 

Seldom                        34  20 

Never                         1  3_ 

Which  test  format  is  more  difficult  for  you? 

Essay                          48  28 

Objective                     51  30 

Neither                       71  42 


Factor  Analysis 
A  correlation  matrix  was  generated  by  the  scores  of 
the  reading  rate  for  the  difficult  passage,  reading  reten- 
tion, comprehension,  the  fluency,  flexibility,  and  origi- 
nality scores,  the  Cattell  test  of  fluid  intelligence,  and 
the  midterm  and  final  course  examinations.   A  principal  axis 
factor  analysis  of  the  correlation  matrix  was  conducted. 
Four  factors  with  eigen  values  greater  than  1.0  emerged. 
The  four  factor  matrix  was  rotated  according  to  the  varimax 
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criterion.   The  resulting  factors  were  labeled  creativity, 
fluid  and  crystallized  abilities,  final  examination,  and 
midterm  examination.   The  factor  loadings  and  the  percen- 
tage of  common  variance  accounted  for  was  reported  in 
Table  6. 

Table  6 
Test  Loadings  on  Rotated  Principal  Axes 


Creativity 
I 

Fluid  g 
Crystallized 
Intelligence 
II 

Final 
Exam 
III 

Midterm 
Exam 
IV 

• 

Reading  Rate 

-.11 

-.04 

.14 

-.01 

/  : 

Retention 

-.11 

.66 

.18 

.05 

•■  if  \. 

Skimming  S 
Scanning 

.17 

.55 

.00 

.09 

■    ^    > 

Comprehension 

.03 

.62 

.30 

.00  , 

Fluency 

.99 

.03 

.03 

.00 

> 

Originality 

.88 

.06 

.02 

.02 

Flexibility 

.83 

,02 

.06 

.10 

^'f^^'  ,  r 

Midterm 
Objective 

.03 

-.02 

-.02 

.74   • 

.'■^^'  ''■:'^ 

Midterm  Essay 

.04 

.15 

-.02 

.36 

Final  Objective 

.06 

.31 

.74 

.05 

Final  Essay 

.11 

.11 

.  .50 

-.06 

, 

Cattell 

.02 

.50 

.03 

.08   ■ 

■  •  ■   ^ 

Eigen  Values 

2.63 

1.83 

.74 

.50 

Proportion  of 
common  variance 
accounted  for 

.44 

.27 

.17 

.12 

— — _ 
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Predicting  Success  on  Essay 
and  Objective  Tests 

The  Pearson  Product  Moment  correlations  among  the 

predictor  and  criterion  variables  have  been  reported  in 

Table  7. 

Table  7 

Correlation  Matrix  of  Predictor 
and  Criterion  Variables 


Ret. 

Comp . 

Cat. 

Orig. 

F 
Essay 

F.O. 

Con. 

Abs. 

Retention      1.00 

.48 

.31 

-.08 

.12 

.37 

.27 

.36 

Comprehension 

1.00 

.33 

.06 

.27 

.40 

.33 

.37 

Cattell 

1.00 

.09 

.05 

.20 

.06 

.27 

Originality 

1.00 

.10 

.11 

.09 

.11 

Final  Essay 

1.00 

.43 

.46 

.30 

Final  Objective 

1.00 

.89 

.89 

Concrete 

1.00 

.57 

Abstract 

1.00 

The  final  objective  and  final  essay  test  scores  were 
plotted  against  each  other.   The  plot  of  the  scores  indi- 
cated that  differences  in  success  on  the  two  form.ats  were 
not  related  to  the  achievement  level  of  the  student  on 
either  examination. 

The  means,  standard  deviations  of  the  predictor  and 
criterion  variables  were  reported  in  Table  8  .   Reliabilities 
for  the  essay  and  objective  tests  were  also  included  in 
Table  8.   Scaled  scores  for  the  retention,  comprehension 
and  originality  tests  were  based  on  a  mean  of  5  0  and  a 
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standard  deviation  of  10  points.   Scores  for  the  Cattell 

test  were  scaled  based  upon  a  mean  of  zero  and  a  standard 

deviation  of  one. 

Table  8 

Summary  of  the  Descriptive  Statistics  for 

the  Predictor  and  Criterion  Variables: 

RAVJ  SCORES 


Standard 

Mean 

Deviation 

Retention 

54 

9.36 

Comprehension 

52 

8.79  , 

Originality 

6U 

12.01 

Cattell 

.83 

.74 

Essay 

12.15 

4.57 

Objective 

26.38 

5.44 

Reliability 


.84 
.75 


The  score  ranges  on  the  essay  and  the  objective  tests 
were  divided  into  four  groups.   The  groups  represented 
scores  from  high  to  low  on  each  test  with  the  division  of 
the  score  ranges  by  standard  deviation  units.   Table  9 
includes  the  mean  score  on  the  independent  variables  by 
category  of  essay  and  objective  test  score.   This  table 
was  designed  to  correspond  approximately  to  the  alphabetic 
marks  the  students  received  on  the  examinations.   Thus, 
mean  scores  on  the  independent  variables  for  students  who 
received  various  marks  on  each  examination  can  be 
compared. 
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Table  9 

Mean  Scores  on  the  Predictor 
Variables  by  Category  of  Z  Scores  on 
the  Essay  and  Objective  Tests 


1 

2 

3 

4 

Essay  Test 

Reading  Rate 

55.3 

52.9 

52.8 

49.8 

Retention 

56.8 

53.3 

54.3 

49.1 

Comprehension 

55.9 

52.6 

49.8 

46.0 

Originality 

67.4 

62.2 

65.2 

56.4 

Cattell 

.98 

.77 

.80 

.79 

Objective  Test 

Reading  Rate 

Retention 

Comprehension 

Originality 

Cattell 


55.8 

53.3 

52.8 

50.1 

59.4 

55.2 

52.4 

50.2 

56.1 

53.8 

48.6 

47.9 

63.4 

67.0 

62.3 

65.1 

1.15 

.99 

.58 

.73 

Hypothesis  1 

The  multiple  correlation  coefficients  representing  the 
regression  of  essay  and  objective  test  scores  on  fluid, 
crystallized  and  creative  abilities  are  equal  to  zero. 

This  hypothesis  was  rejected;  the  multiple  correlation 
coefficients  for  predicting  the  objective  and  essay  test 
scores  were  significantly  different  from  zero.   The  multiple 
correlation  coefficient  for  predicting  objective  test 
scores  was  .46,  [F  (4,143)  =  8.10,  p  <  .05].   The  multiple 
correlation  coefficient  for  predicting  essay  test  scores 
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by  the  ability  measures  was  .29,  [F  (i|,143)  =  2.75, 
p  <    .05]. 

The  cross  validation  was  conducted  using  the  regression 
weights  predicting  scores  on  the  objective  test  for  a 
randomly  selected  group  of  100  students.   The  correlation 
of  the  predicted  scores  and  the  obtained  scores  for  the 
validation  sample  was  .39.   The  two  samples  were  merged 
for  the  remaining  analyses. 

The  results  of  the  regression  of  the  objective  test 

scores  on  the  ability  variables  for  the  cross  validation 

have  been  reported  in  Table  10. 

Table  10 

Summary  of  the  Results  of  the 
Regression  for  Cross  Validation 


R 

r2 

S: 

imple 

R 

Beta 

Retention 

.27 

.07 

.27 

.192 

Comprehension 

.31 

.10 

.25 

.210 

Cattell 

.35 

.13 

.12 

-.031 

Originality 

.35 

.13 

.15 

.176 

Hypothesis 

la. 

The  measure 

of 

fluid 

ability 

will 

not 

increase  the  accuracy  of  prediction  beyond  the  variance  pre- 
dicted by  crystallized  abilities  for  essay  and  objective 
test  scores. 

The  crystallized  abilities  of  comprehension  and  reten- 
tion made  a  statistically  significant  contribution  to  the 
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explained  variance  in  the  objective  test  scores. 

[R^  =  .44,  F  (2,148)  =  18.20,  p  <  .05].   However,  the 

2 
increase  m  the  R   for  the  measure  of  fluid  ability  did 

o 

not  approach  statistical  significance  [AR   =  .001,  F 
(1,149)  =  .22,  p  >  .05]. 

The  comprehension  and  retention  measures  also  made  a 
significant  contribution  to  the  explained  variance  in 
the  essay  test  scores.   The  multiple  correlation  coeffi- 
cient for  the  prediction  of  scores  on  the  essay  by  the 
measures  of  crystallized  ability  was  .27,  [F  (2,148)  = 
5.67,  p  <  .05].   The  measure  of  fluid  ability  did  not  m.ake 
a  statistically  significant  increase  in  the  sums  of  squares 
related  to  essay  test  scores  [AR   =  .002,  F  (1,149)  =  .27, 
p  >  .05]. 

Hypothesis  lb.   Creativity  test  scores  will  not 
increase  the  accuracy  of  prediction  of  essay  and  objective 
test  scores  beyond  the  variance  predicted  by  measures 
of  fluid  and  crystallized  abilities. 

The  increase  in  the  sums  of  squares  due  to  the  addi- 
tion of  the  measure  of  creativity  did  not  reach  statistical 

2 

significance  [AR   =  .004,  F  (1,149)  =  .66,  p  >  .05]  in  the 

equation  written  to  predict  objective  test  scores. 

The  increase  in  the  sums  of  squares  due  to  the  addition 

of  the  measure  of  creativity  in  the  prediction  of  essay 

2 
test  scores  was  not  significant  [AR   =  .009,  F  (1,149)  = 

1. 39,  p  >  .05]. 

Hypothesis  Ic.   The  previous  experience  v^ith  test 

format  and  num.ber  of  related  courses  will  not  significantly 
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increase  the  variance  explained  in  objective  and  essay 
test  scores  by  fluid,  crystallized,  and  creative  abilities. 

,   The  increase  in  R  due  to  the  number  of  related 
courses  in  the  field  and  the  previous  experience  with 
essay  tests  was  not  significant  for  the  equation  with 
objective  test  scores  as  the  dependent  variable  [AR^  =  .03, 
F  (2,145)  =  2.92,  p  >  .05].   The  increase  in  R^  for' the 
number  of  related  courses  and  previous  experience  with 
essay  tests  was  also  not  significant  for  the  equation  which 
specified  the  essay  test  scores  as  the  dependent  variable. 
[AR2  =  .03,  F  (2,145)  =  2.68,  p  >  .05]. 

Hypothesis  Id.   The  interaction  of  crystallized 
abilities  and  creativity  will  not  significantly  increase 
the  variance  explained  in  objective  and  essay  test  scores 
by  fluid,  crystallized  and  creative  abilities,  and  the 
experience  variables. 

The  increase  in  the  R  due  to  the  interaction  was 
not  significant  [AR^  =  .06,  F  =  (4,143)  =  1.66,  p  >  .05]  for 
the  objective  test  as  the  dependent  variable  or  for  the 
essay  test  as  the  dependent  variable  [AR^  =  .04,  F  = 
(4,143)  =  1.37,  p  >  .05]. 

A  summary  of  the  contribution  of  the  predictor  variables 
to  the  explained  variance  in  the  objective  test  scores 
was  presented  in  Table  11. 
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Table  11  • 

Summary  of  the  Changes  in  R   due  to  the 
Addition  of  Predictor  Variables  using  Objective 
Test  Scores  as  the  Criterion 


R 


r2     r2  Change     Simple  R 


Comprehension 

.40 

.16 

.16 

.40 

,"'  '-f'-"  . 

Retention 

.44 

.20 

.04 

.37 

V  i" .  '■ .  ■ 

Cattell 

.45 

.20 

.00 

.20 

Originality 

.46 

.21 

.01 

■■  ^  .n 

Standard  erroi'  =  .90 


The  summary  of  the  contribution  of  the  predictor  vari- 
ables to  the  explained  variance  in  the  essay  test  scores  was 
included  in  Table  12.  . 

Table  12 

Summary  of  the  Changes  in  R^  due  to  the  '.,. 

Predictor  Variables  with  Essay  Test  Scores  as        ;  . 

the  Criterion 


R 

r2 

R^  Change 

Simple  R 

Comprehension 

.27 

.07 

.   .07 

.27 

Retention 

.27 

.07 

'^  .00 

A2 

Cattell 

.27 

.07 

.00 

.05 

Originality 

.29 

.08 

.01 

M 

Standard  error  =  .97 
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Hypothesis  le .   There  are  no  significant  differences 
in  the  regression  weights  for  measures  of  fluid,  crystallized 
and  creative  abilities  in  the  prediction  of  success  on  essay 
and  objective  tests. 

The  overall  hypothesis  of  no  significant  differences 

in  the  regression  weights  was  supported  in  the  prediction 

of  essay  and  objective  test  scores  by  measures  of  fluid, 

crystallized  and  creative  abilities  F  (4,147)  =  2.19, 

p  >  .05.   The  standardized  regression  weights  have  been 

reported  in  Table  13. 

Table  13 

Standardized  Beta,  Weights  for 
Predicting  Essay  and  Objective  Test  Scores 


Dependent 

Independent  Variables 

Variable      _  . 

Retention 

Comprehension  Cattell 

Ori 

ginality 

Essay           .020 
Objective       .241 

.270      ■  -.055 
.266         .025 

.095 
.110 

Predicting  Scores  on  Concrete  and 

Abstract  Obj( 

active  Test  Items 

The  abilities  required  to  succeed  on  an  objective  test 
could  be  linked  to  the  cognitive  level  of  the  objective 
test  item  included  in  the  test.   The  items  were  categorized 
as  abstract  or  concrete  depending  upon  the  degree  of 
generalization  required  to  respond  to  the  item.   The 
predictor  variables  (retention,  comprehension,  and  fluid 
ability)  were  used  to  predict  scores  on  the  two  categories 
of  items.   The  means  and  standard  deviations  for  the 
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predictor  and  the  criterion  variables  have  been  reported 

in  Table  14. 

Table  14 

Summary  of  the  Descriptive 
Statistics  for  the  Criterion  Variables 


No.  of  Items      Mean  S.D. 


Concrete  Items  22  12.59  3.10 

Abstract  Items  2  3  12.08  3.09  • 

Hypothesis  2 

The  multiple  correlation  coefficients  representing 
the  regression  of  scores  of  concrete  and  abstract  objective 
test  items  on  fluid  and  crystallized  abilities  are  zero. 

Since  originality  v/as  not  postulated  to  affect  scores 
on  types  of  objective  test  items,  only  the  retention, 
comprehension  and  Cattell  scores  were  used  in  the  analysis. 
The  multiple  correlation  coefficient  for  the  regression 
of  concrete  items  on  the  independent  variables  was  .36 
which  was  significant,  F  (3,147)  =  7.19,  p  <_  .05.   The 
regression  of  the  abstract  item  scores  on  the  three  inde- 
pendent variables  produced  a  multiple  correlation  coeffi- 
cient of  .45  v;hich  reached  statistical  significance  F  (3,147) 
12.19,  p  <  .05. 

Hypothesis  2a.   The  measures  of  fluid  ability  will  not 
increase  the  accuracy  of  prediction  of  concrete  and  abstract 
objective  test  items  beyond  the  variance  predicted  by  the 
measures  of  crystallized  ability. 
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The  Cattell  measure  of  fluid  ability  did  not 
improve  the  prediction  of  scores  on  concrete  items.   In 
fact,  the  inclusion  of  fluid  ability  in  the  model  slightly 

increased  the  standard  error  of  prediction.   The  increase 

2  2 

in  R  due  to  fluid  ability  scores  was  AR   =  .005,  F  (1,149)  = 

.80,  p  >  .05.   Thus,  a  model  including  only  the  comprehension 
and  retention  scores  was  adequate.   The  multiple  correlation 
coefficient  for  the  reduced  model  was  .35,  F  (2,148)  = 
10.40,  p  <_  .05.   The  standard  error  was  .95. 

The  inclusion  of  the  Cattell  scores  in  the  equation 
written  to  predict  scores  on  the  abstract  items  did  reduce 

the  standard  error  of  prediction,  but  the  increase  in  sums 

2 

of  squares  was  not  significant,  AR  =  .02,  F  (1,149)  = 

2.86,  p  >  .05. 

A  summary  of  the  contribution  of  the  predictor 
variables  to  the  explained  variance  in  the  objective  test 
scores  vjas  presented  in  Table  15. 

Table  15 

2 
Summary  of  the  Changes  in  R  due  to  the 

Addition  of  Predictor  Variables  using  Concrete 

Item  Scores  as  the  Criterion 


R^     R^  Change   Simple  R 


Com.prehension 

.33 

.11 

.11 

.33 

Retention 

.35 

.12 

.02 

.27 

Cattell 

.35 

.13 

.00 

.06 

Standard  error  =  .95 
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The  summary  of  the  contribution  of  the  predictor 
variables  to  the  explained  variance  in  the  abstract  item 

scores  v;as  included  in  Table  16 .  - 

Table  16 

2 

Summary  of  the  Changes  in  R  due  to  the 

Addition  of  Predictor  Variables  using  Abstract 
Item  Scores  as  the  Criterion 


R^    R^  Change    Simple  R 


Comprehension 

.37 

.14 

.14 

.37 

Retention 

.43 

.18 

.04 

.36 

Cattell 

.45 

.20 

.02 

.27 

Standard  error  =  .91 


Hypothesis  2b.   There  are  no  significant  differences  in 
the  regression  weights  for  measures  of  fluid  and  crystal- 
lized abilities  in  the  prediction  of  success  on  concrete 
and  abstract  items. 

The  hypothesis  of  no  significant  differences  in  the 
comparable  regression  weights  for  the  concrete  and  abstract 
items  was  supported,  F  (3,14  7)  =  1.95,  p  >  .05.   The 
standardized  beta  weights  for  the  two  equations  have  been 
reported  in  Table  17. 
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Table  17 

Standardized  Beta  Weights  for  the 
Regression  of  Concrete  and  Abstract  Items  on 
Measures  of  Fluid  and  Crystallized  Abilities 


Comprehension 

Retention 

Cattell 

Concrete  Items            .274 
Abstract  Items           .229 

.161 
.213 

-.074 
.134 

Predicting  Relative  Position 
■  '       on  Essay  and  Objective  Tests 

_  The  discriminant  function  analysis  was  designed  to   •  ' 
test  the  degree  to  which  fluid,  crystallized  and  creative 
abilities  would  predict  the  classification  of  students. 
The  three  categories  included  students  whose  essay  test 
scores  were  one  z  score  higher  than  their  objective  test 
scores  as  one  group.   The  second  group  included  students 
whose  objective  test  scores  were  one  £  score  higher  than 
their  essay  test  scores.   The  third  group  was  composed  of 
all  students  whose  scores  on  the  two  forms  of  the  examina- 
tion were  within  one  z  score. 
Hypothesis  3 

There  are  no  significant  differences  in  the  pattern 
of  abilities  for  students  who  score  high  on  objective  tests, 
high  on  essay  tests,  or  equally  on  essay  and  objective  tests, 

The  variables  were  entered  in  a  direct  solution;  the 
combination  of  variables  was  expected  to  predict  differences 
in  success  on  the  two  test  formats.   The  summary  of  the 
statistical  tests  of  significance  has  been  presented  in 
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Table  16.   The  first  discriminant  function  was  significant, 
and  its  eigen  value  accounted  for  79  percent  of  the  between 
group  variance . 

Table  18 
Summary  of  Statistical  Tests 


Canonical 
Disc.   Eigen    Rel .   Corre-   Wilks 
Func.   Value   Percent  lation    A       X^     D.F.     Sig, 


1  .195    78.96     .40     .795     33.356     12      .001 

2  .052    21.0if     .22     .951      7.385      5 


The  weights  for  the  discriminant  functions  have  been 

reported  in  Table  19. 

Table  19 

Standardized  Discriminant 
Function  Coefficients 


Function  1  Function  2 


Retention  .1602  -.3586 

Comprehension  -.2309  '            .7200 

Originality  -.0571  "            -.3687 

Cattell  -.1162  -.7956 

Abstract  1.0384  .0455 

Concrete  -.0209  .1326 


The  univariate  tests  for  differences  among  the  groups 
on  the  independent  variables  found  significant  differences 
on  the  concrete  and  abstract  items.   The  results  of  the 
univariate  tests  have  been  reported  in  Table  20. 
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Table  20 


Univariate  Tests  for  Significant 
Differences  Among  the  Groups 


Retention  2.17 

Comprehension  .96 

Originality  .H6 

Cattell  2.02 

Concrete  Items  3.92* 

Abstract  Items  13.47* 

*p  <    .05 

The  centroids  of  the  groups  aided  in  the  interpretation 
of  the  results.   Group  one,  high  on  objective  tests,  had 
a  higher  average  score  on  the  first  discriminant  function. 
(See  Table  21).   The  high  essay  group  had  a  negative  average 
score.   The  third  group  scored  between  the  high  essay  and 
high  objective  test  groups. 

Table  21 
Group  Centroids 

D.F.  D.F. 

Group  I  II 

1  .5215  -.3170 

2  -.6951  -.1911 

3  .0639  .1869 
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The  results  of  the  classification  analysis  indicated 

that  group  membership  could  be  predicted  for  4  5.7  percent 

of  the  cases.   However,  much  of  the  error  in  prediction 

was  attributed  to  the  third  group.   (See  Table  22). 

Table  2  2 

Percentage  of  Cases 
Correctly  Classified 

Predicted   Predicted   Predicted 
Group  N      Group  1    Group  2    Group  3 

1  N         32 
% 

2  N         32 
% 

3  N         87 
% 


21 

4 

7 

66 

12 

22 

5 

19 

8 

16 

59 

25 

25 

33 

29 

29 

38 

33 

Summary 
The  results  have  been  summarized  for  each  phase  of 
the  study.   The  initial  phase  was  a  description  of  the 
characteristics  of  the  sample  and  an  assessm.ent  of  the 
independence  of  the  creativity  dimension.   It  was  reported 
that  approximately  one  fourth  of  the  students  had  taken  at 
least  three  other  courses  in  the  field.   Also,  almost  one 
half  of  the  class  were  transfer  students.   Students  were 
near^ly  evenly  divided  on  test  format  preference.   Roughly 
a  third  of  the  students  preferred  either  objective  or 
^    essay  tests.   The  remaining  third  expressed  no  preference. 
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1,  The  independence  of  the  creativity  dimension  was 
established  through  a  principal  axis  factor  analysis. 
The  creativity  factor  accounted  for  ^^4  percent  of  the 
total  variance.  The  ability  and  achievement  measures 
had  near  zero  loadings  on  the  creativity  factor. 

The  regressions  of  essay  and  objective  test  scores 
on  the  ability  measures  had  similar  results.   The  multiple  :: 
correlation  coefficients  reached  statistical  significance. . 
Crystallized  ability  had  the  highest  correlation  with 
the  essay  and  objective  test  scores.   The  increase  in 
the  sums  of  squares  for  fluid  and  creative  abilities 
did  not  reach  statistical  significance  in  either  equation. 
The  comparison  of  the  respective  regression  weights  for 
the  equations  predicting  essay  and  objective  test  scores 
resulted  in  no  significant  difference. 

The  comprehension  and  retention  measures  of  crystal- 
lized measures  made  a  significant  contribution  to  the 
explained  variance  in  the  prediction  of  concrete  and  abstract 
objective  test  items.   The  Cattell  measure  of  fluid  ability 
did  not  make  a  significant  increase  in  the  explained 
variance  in  either  equation.   The  comparison  of  the  weights 
for  the  equations  predicting  concrete  and  abstract  item 
scores  failed  to  reveal  a  significant  difference. 

A  difference  in  the  configuration  of  the  pattern  of 
abilities  was  found  between  those  students  who  scored  higher 
on  essays  and  those  who  scored  higher  on  the  objective  test. 
Students  who  scored  higher  on  objective  tests  tended  to 
get  more  abstract  items  correct.   Abstract  items  correlated 
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higher  with  the  Cattell  measure  of  fluid  ability  than  • 
did  the  concrete  items.   The  correlation  of  concrete  item 
scores  with  the  Cattell  test  was  near  zero.   Separate 
univariate  tests  for  differences  among  the  groups  on  each 
independent  variable  resulted  in  significant  differences 
in  achievement  on  concrete  and  abstract  items .   Students 
who  scored  higher  on  objective  tests  had  a  high  positive 
weight  on  the  discriminant  function  which  differentiated 
the  high  essay  and  high  objective  groups.   Students  who 
scored  high  on  the  objective  test  had  a  low  negative 
weight  on  the  concrete  items. 
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CHAPTER  V 
DISCUSSION  AND  CONCLUSIONS 

The  problem  addressed  in  this  study  was  to  deter- 
mine the  extent  to  which  measures  of  fluid,  crystallized 
and  creative  abilities  would  predict  success  on  essay 
and  objective  examinations.   Several  factors  could 
influence  the  results  of  the  study:   the  characteristics 
of  the  sample,  the  interrelationships  of  the  ability 
measures,  and  the  scoring  procedures.   These  factors 
were  dealt  with  in  the  first  phase  of  the  analysis. 
The  remainder  of  the  study  was  concerned  with  the  iden- 
tification and  comparison  of  the  patterns  of  abilities 
which  predict  success  or  relative  success  on  essay  and 
objective  tests.   The  probability  that  differences  existed 
in  the  abilities  required  to  succeed  on  particular  types 
of  objective  test  items  was  also  investigated. 

Preliminary  Phase 
Characteristics  of  the  Sample 

There  were  two  major  concerns  about  the  sample. 
The  possibility  existed  that  students  who  had  previous 
courses  from  the  instructor  or  in  the  department  would 
be  advantaged.   Also,  students  who  did  not  have  experience 
writing  a  particular  type  of  examination  could  be  dis- 
advantaged.  Coffman  (1971)  stressed  that  these  concerns 
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must  be  met  in  research  on  essay  examinations.   Even 
though  a  substantial  number  of  students  had  previous 
courses  in  political  science,  knowledge  of  related 
content  in  the  field  did  not  significantly  improve  the 
prediction  of  essay  or  objective  test  scores.   Moreover, 
training  to  write  essays  or  objective  tests  did  not 
contribute  to  the  prediction  of  essay  or  objective  test 
scores.   These  findings  do  not  mean  that  the  variables 
are  no  longer  important.   The  examinations  used  in  this 
course  may  have  been  unusual  for  the  students  regardless 
of  their  past  experience  with  examinations  or  the  sub- 
ject matter.   The  course  description  stated  that  the 
ability  to  integrate  ideas  and  to  analyze  current  events 
in  international  affairs  was  stressed.   The  essay  item 
was  a  statement  by  the  Secretary  of  State  which  required 
students  to  recognize  and  analyze  the  relevant  issues. 
Since  the  topic  was  broad  and  not  tightly  structured  for 
the  student  (as  Vernon,  1961,  suggested)  a  different  type 
of  essay  item  would  undoubtedly  alter  the  results. 
Another  consideration  was  the  novelty  of  the  objective 
items  for  the  students.   Many  students  commented  that  they 
found  the  test  to  be  unusual  and  interesting  in  itself. 
While  other  instructors  in  the  department  used  objective 
tests,  they  did  not  use  the  same  style  of  objective  test 
items.   Therefore,  the  effect  of  previous  experience  of 
students  v/ith  either  item  format  was  reduced.   If  a 
number  of  students  had  taken  other  courses  from  the 
instructor,  then  the  effect  of  these  variables  may  have 
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been  more  substantial.  However,  only  14  students  indicated 
that  they  had  taken  other  courses  from  the  same  instructor. 
Interrelationship  of  the  Variables 

A  factor  analysis  of  the  three  creativity  scores, 
the  reading  test  scores,  the  measure  of  fluid  ability 
and  the  scores  from  the  course  examinations  was  conducted. 
The  major  purpose  of  the  analysis  V7as  to  determine  the 
independence  of  a  creativity  dimension.   The  four  factor 
solution  was  remarkably  unambiguous .   The  creativity 
factor  accounted  for  the  greatest  proportion  of  variance. 
There  were  near  zero  loadings  of  the  m.easures  of  fluid 
and  crystallized  abilities  on  the  creativity  factor. 
Therefore,  the  conceptualization  of  creativity  as  a 
separate  dimension  was  warranted  for  these  data. 

The  fact  that  the  fluid  and  crystallized  abilities 
loaded  on  the  same  factor  was  also  consistent  with  the 
literature.   It  is  only  in  second  order  factor  analyses 
that  fluid  and  crystallized  abilities  separate. 

The  two  factors  representing  the  midterm  and  final 
examinations  may  have  been  expected  to  have  grouped  the 
two  essays  and  the  two  objective  examinations.   However, 
the  similarity  in  content  tested  outweighed  the  similarity 
due  to  test  format.   Moreover,  the  crystallized  ability 
tests  correlated  more  highly  with  the  final  examination 
than  with  the  midterm  examination.   The  lack  of  familiarity 
with  the  format  of  the  test  items  on  the  midterm  may 
account  for  the  change  in  the  correlations.   For  example, 
the  reading  comprehension  score  had  a  zero  loading  on 
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the  midterm  factor  which  increased  to  a  loading  of  .30 
on  the  final  test  factor.   Both  the  midterm  and  the  final 
examinations  included  essay  and  objective  items. 

Predicting  Success  on  Essay  and  Objective  Tests 
The  intercorrelations  among  the  variables  were 
moderate.   The  correlations  of  the  originality  variable 
with  the  remaining  variables  were  particularly  low,  and 
in  most  cases  failed  to  reach  statistical  significance. 
The  very  low  magnitude  of  these  correlations  V7as  not 
expected.   A  recurring  charge  in  the  literature  was  that 
creativity  measures  represented  a  form  of  verbal  intel- 
ligence.  There  were,  however,  some  interesting  dif- 
ferences in  the  patterns  of  means  (See  Table  8)  vjhich 
helped  to  explain  the  low  correlations.   Originality 
scores  were  generally  high  across  the  first  three  grade 
categories  of  the  essay  test,  but  fell  sharply  for  the 
lowest  grade  category.   Originality  scores  remained 
fairly  constant  across  all  grade  categories  on  the 
objective  test. 
Multiple  Regression  Analysis 

The  regression  weights  for  the  essay  and  objective 
test  scores  on  the  measures  of  fluid,  crystallized  and 
creative  abilities  were  not  significantly  different. 
Crystallized  ability  scores  were  important  in  predicting 
both  essay  and  objective  test  scores.   The  measures  of 
fluid  and  creative  abilities  did  not  improve  the  prediction 
of  either  essay  or  objective  test  scores.   The  multiple 
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correlation  coefficient  was  higher  for  the  regression  of 
objective  test  scores  than  for  essay  test  scores.    -    '   ' 

The  plot  of  the  residuals  indicated  that  the  assump- 
tion of  homogeneity  of  error  variance  vias   violated.   Most 
of  the  error  in  prediction  was  in  the  middle  of  the 
distribution.   The  concentration  of  error  could  be  due 
to  scoring  error  in  the  essay  or  to  less  than  perfect 
reliability  in  the  objective  test.   An  equally  plausible 
explanation  is  that  other  factors  such  as  motivation  or 
study  habits  have  a  greater  effect  for  the  middle  ability 
student  than  for  students  at  either  end  of  the  spectrum. 

The  possibility  that  the  originality  measure  may  have 
had  a  curvilinear  relationship  with  the  essay  or  the 
objective  test  scores  was  considered.   However,  the  pattern 
of  means  showed  an  irregular  rather  than  a  curvilinear 
relationship.   Part  of  the  explanation  for  this  irregu- 
larity may  have  been  due  to  an  anomaly  of  the  sample. 
There  were  twelve  black  students  in  the  sample;  two  thirds 
of  these  students  earned  exceptionally  high  scorczs  on  the 
originality  measure.   However,  all  but  one  of  these    ,  - 
students  had  lovjer  than  average  scores  on  the  reading 
measures  and  average  or  lower  grades  on  the  examinations. 
It  may  be  pure  speculation  that  a  cultural  difference  may 
have  been  expressed  in  the  originality  score  because 
there  were  so  few  students.   Further  investigation  of 
this  possibility  may  be  warranted. 

These  findings  did  not  completely  explain  the  rela- 
tionship between  essay  and  objective  tests.   First,  the 
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magnitude  of  each  multiple  correlation  coefficient  v/as 
not  high  even  though  they  reached  statistical  significance 
Second,  the  extent  to  which  students  differed  in  success 
on  essay  and  objective  tests  has  not  been  explicated.   The 
plot  of  essay  versus  objective  test  scores  illustrated 
that  differences  in  success  occurred  all  along  both 
distributions  of  scores.   That  is,  a  substantial  number 
of  students  scored  higher  on  one  examination  than  on  the 
other.   This  pattern  was  true  for  high  and  low  scoring 
students  on  either  test.   This  finding  did  not  coincide 

ith  the  Biggs  and  Braun  (1972)  study  which  suggested 
that  students  in  the  middle  of  the  distribution  of  scores 
were  the  most  likely  to  be  affected  by  unequal  success 
on  essay  and  objective  tests. 

Predicting  Scores  on  Concrete  and 
Abstract  Objective  Test  Items 

The  abilities  required  to  succeed  on  an  objective 
test  depend  upon  the  cognitive  abilities  required  by  the 
items.   Differ>ent  sets  of  items  may  require  different 
patterns  of  abilities.   Thus,  the  debate  about  the 
association  of  fluid,  crystallized  and  creative  abilities 
with  success  on  essay  and  objective  tests  may  hinge  upon 
the  nature  of  the  items.   The  essay  item  was  broadly 
structured  and  required  analysis  of  the  course  content 
to  succeed.   The  objective  test  items  were  of  two  types: 
concrete  and  abstract. 

The  regression  analyses  of  the  concrete  and  abstract 
items  produced  similar  results.   Fluid  ability  did  not 
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significantly  improve  the  prediction  of  the  concrete  or 
the  abstract  item  scores.   Comprehension  and  retention 
were  required  for  both  sets  of  items.   This  finding  was 
somewhat  unexpected,  because  the  Cattell  abstract  reasoning 
test  correlated  zero  with  the  scores  on  the  concrete  items 
and  .27  with  the  scores  on  the  abstract  items.   Thus, 
there  was  a  modest  confirmation  that  the  abstract  items 
V7ere  in  fact  measuring  an  abstract  reasoning  ability. 
Undoubtedly,  the  failure  of  the  Cattell  measure  to  enter 
the  equation  was  due  to  its  correlation  with  retention 
and  comprehension.   There  was  not  enough  unique  variance 
due  to  fluid  ability  once  the  crystallized  abilities 
entered  the  equation.   Differences  in  the  regression 
weights  did  approach  statistical  significance.   Perhaps 
a  further  refinement  of  the  items  would  substantiate 
the  difference  at  a  conventional  level  of  statistical 
significance. 

It  was  anticipated  that  students  with  strong  crystal- 
lized abilities  would  tend  to  do  well  on  the  concrete 
items;  however,  the  Pearson  Product  Moment  correlations 
between  the  two  crystallized  abilities  of  comprehension 
and  retention  were  higher  for  abstract  items  than  for 
concrete  items.   The  correlation  between  the  scores  on 
the  concrete  items  and  the  essay  test  V7ere  higher  than 
the  correlation  between  the  abstract  items  and  the  essay 
test.   Thus,  a  new  relationship  is  beginning  to  emerge. 
It  may  be  that  this  essay  tended  to  measure  the  ability 
of  the  students  to  comprehend  and  relate  the  course  material. 
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Thus,  students  v/ho  did  better  on  the  essay  tended  to  do 
better  on  concrete  items  which  were  specifically  tied 
to  course  materials.   These  items  were  as  difficult  for 
the  class  as  the  abstract  items.   (See  Table  14). 
Therefore,  it  cannot  be  stated  that  the  differences  are 
simply  an  artifact  of  the  grading  standards  or  the  diffi- 
culty of  the  items.   The  relationship  also  cannot  be 
attributed  solely  to  memory  of  factual  material.   Neither 
the  essay  nor  the  set  of  concrete  items  was  designed  to 
test  factual  knowledge. 

Predicting  Relative  Success  on 
Essay  and  Objective  Tests 

The  final  question  to  be  addressed  was  whether  or 

not  a  linear  combination  of  fluid,  crystallized  and 

creative  abilities  would  differentiate  groups  of  students 

who  score  higher  on  one  test  format.   The  students  were 

grouped  as  higher  on  the  essay,  higher  on  the  objective 

test,  or  the  same  on  both  tests.   The  discriminant  function 

analysis  resulted  in  one  statistically  significant  function. 

This  function  differentiated  the  high  essay  and  high 

objective  groups.   Success  on  abstract  items  and  retention 

were  the  variables  which  had  the  highest  positive  weights 

on  the  discriminant  function.   The  high  objective  group 

had  a  moderately  high  mean  on  the  function.   The  high 

essay  group  had  a  negative  mean.   Therefore,  the  high 

objective  group  was  characterized  as  stronger  on  abstract 

reasoning  and  retention  while  the  high  essay  group  was 

weaker  in  these  abilities.   The  comprehension  and 
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creativity  scores  did  not  differentiate  the  essay  and 
objective  groups.   Those  students  who  scored  equally  well 
on  both  test  formats  could  not  be  predicted  by  the  ability  - 
variables.   This  group  scored  between  the  essay  and  objec- 
tive test  groups  on  the  discriminant  function  and  was 
equally  likely  to  belong  to  either  group. 

The  fact  that  differences  between  the  high  essay 
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and  high  objective  groups  were  found  on  the  concrete  and 
the  abstract  items  was  not  surprising.   The  difference 
was  J  in  part,  a  function  of  the  categorization  of  the      ; 
three  groups.   Students  higher  on  essays  had  necessarily 
lower  objective  test  scores.   However,  it  was  the  abstract 
items  which  differentiated  the  groups  on  the  discriminant 
function. 

Summary 

The  overall  pattern  of  the  regression  of  essay  and 
objective  test  scores  on  the  fluid,  crystallized  and 
creative  ability  scores  did  not  result  in  statistically 
significant  differences.   Crystallized  ability  scores  made 
the  largest  contribution  to  explained  variance  in  both 
equations .   Fluid  and  creative  abilities  did  not  improve 
the  prediction  of  either  essay  or  objective  test  scores. 

The  regression  of  the  scores  on  abstract  and  concrete 
items  on  measures  of  fluid  and  crystallized  abilities  did 
not  result  in  significant  differences  in  the  overall 
pattern  of  the  regression  weights.   Comprehension  and 
retention  scores  were  retained  in  the  model  for  each  equation, 
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The  p-j?ediction  of  relative  standing  on  essay  and  •  •  ■ 
objective  tests  resulted  in  a  linear  combination  of      V 
variables  v/hich  differentiated  students  who  scored 
higher  on  essay  tests  from  those  who  scored  higher  on  . 
objective  tests.   Students  who  scored  higher  on  objec- 
tive tests  tended  to  receive  higher  scores  on  the 
abstract  objective  test  items  and  on  the  retention 
measure.     .  .  '^  ^  ^ '--  i  .'   -' ^  .   ■-  "''   '  -  .  -  *^  V 

Implications  of  the  Study 

This  study  found  no  significant  differences  in  the 
pattern  of  abilities  which  predict  success  on  essay  and 
objective  tests.   However,  the  investigation  vjas  limited 
to  the  comparison  of  an  unstructured  essay  and  an  objec- 
tive test  which  was  designed  to  measure  complex  mental 
processes.   Variations  in  the  structure  of  the  essay  or 
objective  test  items  may  shed  different  insights  into  the 
relationship  between  the  two  test  formats.   Moreover, 
further  refinement  of  the  essay  and  objective  test  items 
could  clarify  the  reasons  for  the  lack  of  homogeneity 
in  error  variance.   Another  possible  avenue  of  research 
would  be  to  investigate  cultural  differences  reflected 
in  scores  on  creativity  measures  in  the  event  that  these 
dif fer^ences  may  have  obscured  the  relationship  between 
the  tests. 

No  differences  in  the  comparison  of  regression  weights 
were  found;  but,  students  did  differ  in  their  ability  to 
succeed  on  essay  and  objective  tests.   One  third  of  the 
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students  scored  substantially  higher  on  either  the  essay 
or  the  objective  test.   The  differences  in  success  were 
related  to  success  on  a  particular  type  of  item.   Students 
who  scored  well  on  the  objective  test  had  higher  scores 
on  abstract  objective  items  and  on  retention.   Students 
who  were  more  successful  on  the  essay  also  tended  to  do 
well  on  the  concrete  objective  items.   Further  verifi- 
cation of  this  finding  both  within  and  across  disciplines 
should  be  made . 

An  analysis  of  the  writing  ability  of  the  students 
could  clarify  the  error  in  prediction.   It  may  be  that 
the  ability  to  organize  and  express  ideas  clearly  was  a 
deciding  factor  in  the  scoring  of  essays  in  the  middle  of 
the  distribution.   Part  of  the  folklore  in  grading  essays 
is  that  it  is  relatively  easy  to  differentiate  the 
excellent  and  poor  papers.   The  problem  in  scoring  is  to 
separate  the  average  papers  from  those  which  are  good 
but  not  outstanding. 


CHAPTER  VI 
SUMMARY 

This  study  was  an  investigation  of  the  hypothesis 
that  different  cognitive  abilities  were  measured  by 
essay  and  objective  tests.   The  study  compared  a  broad, 
unstructured  essay  with  an  objective  test  consisting  of 
concrete  and  abstract  items.   The  rationale  for  the  study 
was  that  differences  in  students'  achievement  on  essay 
and  objective  tests  could  be  explained  by  a  combination 
of  student  and  test  variables.   The  student  variables 
were  of  two  types.   The  first  set  of  variables  included 
measures  of  fluid,  crystallized  and  creative  abilities 
(Cattell,  1963;  Rossman  8  Horn,  1972).   The  other  set 
of  variables  included  the  students '  previous  experience 
with  essay  examinations  and  related  courses. 

The  attributes  of  the  tests  would  also  be  related 
to  differences  in  success  on  the  two  formats.   Of  partic- 
ular concern  were  the  reliabilities  of  the  examinations 
and  the  cognitive  level  of  the  items.   A  50  item  objective 
test  was  developed  using  classical  test  development 
procedures;  reliability  was  reported  as  a  measure  of 
internal  consistency.   The  objective  test  items  were 
categorized  as  concrete  when  they  were  related  directly 
to  information  given  in  lectures  or  the  text.   The  items 
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were  designed  to  assess  the  students'  ability  to  analyze 
material  or  ideas  drawn  from  the  course  content.   It  was 
anticipated  that  these  items  would  favor  students  with 
stronger  crystallized  abilities.   A  second  type  of  item 
was  v?ritten  in  which  the  link  to  specific  course  content 
was  less  direct.   These  items  were  labeled  abstract. 
The  abstract  items  required  that  the  students  recognize 
the  relevant  concepts  and  make  generalizations  based 
upon  an  understanding  of  their  interrelationships.   These 
broader  items  were  expected  to  require  more  fluid  ability. 

The  reliability  of  essay  tests  has  generally  been 
lower  than  objective  tests.   Reduced  reliabilities  have 
been  traced  to  problems  in  the  construction  of  the  essay 
item  and  to  problems  in  scoring.   The  one  hour  essay  item 
for  the  final  examination  was  designed  to  maximize  the 
difference  between  the  essay  and  the  objective  test 
(Vernon,  1961).   The  essay  item  was  broad  and  not  highly 
structured,  and  global  scoring  was  used.   The  criteria 
for  scoring  included:   understanding  the  dilemma  posed 
by  the  question,  synthesis  of  diverse  material  and  rele- 
vance of  exam.ples,  discussion  and  analysis  of  conceptual 
issues,  and  originality  of  perspective.   Reliability  of 
scoring  was  established  by  correlating  the  scores  from 
tv;o  separate  readings  of  the  essays. 

The  subjects  for  the  study  were  16  8  students  enrolled 
in  an  introductory  course  in  political  science.   Each 
student  was  asked  to  complete  a  questionnaire  and  four 
examinations.   The  questionnaire  was  designed  to  gather 


background  information  from  the  students.   The  examination 
used  to  measure  crystallized  ability  was  the  McGraw-Hill 
Reading  Test.   Fluid  ability  was  measured  with  the  Cattell 
Culture  Fair  Intelligence  Test:   Scale  Three.   The  Torrance 
Tests  of  Creativity  were  also  administered.   The  final 
examination  was  a  one  hour  essay  and  a  fifty  item  multiple 
choice  test.    ■  s  :   ;  •■  .      '•■".':  •  . 

The  investigation  of  the  contribution  of  fluid, 
crystallized  and  creative  abilities  to  the  prediction  of 
success  on  essay  and  objective  tests  was  conducted  in 
four  stages.   The  preliminary  phase  of  the  study  involved 
a  description  of  the  sample  and  an  investigation  of  the 
independence  of  a  creativity  dimension.   The  questionnaire 
data  revealed  that  there  were  more  students  who  had  already 
taken  other  political  science  courses  than  v/as  expected 
for  an  introductory  course.   This  fact  was  expected  to 
influence  the  outcome  of  the  study.   Hovjever,  many  of 
the  students  had  transferred  from  other  institutions; 
thus ,  only  a  few  students  had  previous  courses  from  the 
instructor  and  were  familiar  with  the  testing  procedure. 
The  contribution  of  these  student  variables  to  the  explained 
variance  in  essay  and  objective  test  scores  did  not  reach 
statistical  significance  at  the  level  specified  for  the 
study . 

The  other  concern  in  the  preliminary  phase  of  the   '   ■ 
study  was  to  document  the  independence  of  a  creativity 
dimension  for  these  data.   Part  of  the  controversy  sur-  ' 
rounding  models  of  the  organization  of  human  abilities 
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has  been  the  role  of  creativity  in  the  models.   The 
creativity  factor  was  included  in  the  study  on  the  basis 
of  a  factor  analytic  investigation  of  the  underlying 
relationships  between  the  ability  and  achievement  variables. 
A  principal  axis  analysis  vjas  conducted  and  four  factors 
with  eigen  values  greater  than  one  emerged.   Forty  percent 
of  the  variance  was  explained  by  the  creativity  factor. 
A  fluid  and  crystallized  ability  factor  and  midterm  and 
final  examination  factors  V7ere  revealed  in  this  analysis. 
The  midterm  and  final  examinations  did  not  load  on  the 
same  factor.   One  reason  that  these  course  examinations 
did  not  correlate  more  highly  was  the  novelty  of  the  items 
for  the  students .   The  midterm  examination  served  as  a 
vehicle  for  providing  some  familiarity  with  the  item 
formats .   Factor  loadings  of  the  ability  measures  on  the 
midterm  examination  were  near  zero,  whereas  the  crystal- 
lized ability  scores  did  correlate  moderately  with  the 
factor  for  the  final  examination. 

The  second  phase  of  the  study  was  a  comparison  of 
the  regression  of  essay  and  objective  test  scores  on  the 

measures  of  crystallized,  fluid  and  creative  abilities. 

2 
The  increase  in  R   for  fluid  and  creative  abilities  beyond 

the  variance  explained  by  crystallized  abilities  was  not 
statistically  significant  in  either  equation.   The  inter- 
action of  crystallized  ability  measures  and  creativity 
was  also  not  significant.   The  differences  in  the  beta 
weights  for  the  equations  written  to  predict  the  essay 
and  objective  test  scores  were  tested  to  see  if  they  were 


.^ 


simultaneously  equal  to  zero.   The  differences  in  the 
beta  weights  were  not  significantly  different  from  zero. 

The  third  stage  of  the  analysis  was  an  investigation 
of  the  relationship  of  cr-ystallized  and  fluid  abilities 
to  scores  on  concrete  and  abstract  objective  test  items. 
This  analysis  vjas  conducted  because  the  relationship 
between  scores  on  essay  and  objective  tests  could  be  a 
function  of  the  cognitive  level  of  the  objective  test 
items.   Kovjever,  fluid  ability  did  not  make  a  signifi- 
cant contribution  beyond  the  variance  explained  by 
crystallized  ability  even  though  the  abstract  item 
scores  did  correlate  raoderately  with  the  fluid  ability 
measure.   Concrete  item  scores  had  a  near  zero  correla- 
tion with  the  fluid  ability  measure.   The  differences 
in  the  beta  weights  for  the  equations  with  the  concrete 
and  abstract  item  scores  as  the  dependent  variables  were 
not  significantly  different  from  zero.   Further  refine- 
ment of  the  items  may  substantiate  a  difference  in  the 
abilities  required  to  succeed  on  the  concrete  and 
abstract  items . 

The  purpose  of  the  final  analysis  was  to  investigate 
differences  in  fluid,  crystallized  and  creative  abilities 
for  students  who  scored  higher  on  the  essay,  higher  on 
the  objective  test,  or  the  same  on  both  examinations. 
The  combination  of  the  fluid,  crystallized  and  creative 
ability  scores  and  scores  on  concrete  and  abstract  items 
were  expected  to  differentiate  students  who  performed 
better  on  the  essays  from  those  v7ho  performed  better  on 
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the  objective  test.   A  discriminant  function  was  found  to 
differentiate  the  high  essay  from  the  high  objective 
score  group.   Higher  scores  on  the  abstract  items  charac- 
terized the  high  objective  group.   The  high  essay  group 
did  relatively  better  on  the  concrete  items  than  the 
abstract  items.   Therefore,  the  hypothesis  of  this  study 
that  fluid  ability  would  be  more  closely  associated  with 
success  on  essay  exajninations  was  not  supported.   Students 
who  scored  higher  on  the  objective  test  items  were  more 
successful  on  abstract  items  than  on  concrete  items. 
Students  who  performed  equally  well  on  both  exam.inations 
were  equally  successful  on  the  concrete  and  abstract 
items . 
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APPENDIX  A 
..,   EXAMPLES  OF  CONCRETE  AND  ABSTRACT  ITEMS 
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EXAMPLES  OF  CONCRETE  AND  ABSTRACT  ITEMS 

Examples  of  Concrete  Items 

1.  Which  of  the  following  is  not  a  factor  explaining 
the  diminished  role  of  Congress  in  foreign  policy  making? 

a.  the  crisis  nature  of  foreign  policy  dem.ands 
rapid  responses. 

b.  expenditures  related  to  foreign  policy  are 
too  complex  for  easy  understanding. 

c.  Congress  lacks  the  staff  and  expertise  of 
the  executive  branch. 

d.  foreign  policy  formulation  often  requires 
secrecy. 

e.  the  reluctance  of  the  Congress  to  advise  and 
consent  to  treaties  proposed  by  the  President. 

2.  United  States  activities  in  Latin  America  bear 
similarities  to  Soviet  activities  in  Eastern  Europe  EXCEPT 

a.  for  the  willingness  to  permit  a  variety  of 
political  system  types. 

b.  for  efforts  to  keep  other  major  powers  out. 

c.  for  the  heavy  involvement  in  the  economies. 

d.  for  the  use  of  official  and  unofficial  means 
to  influence  governments. 

e.  for  the  willingness  to  use  short-term 
military  intervention. 
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Examples  of  Abstract  Items 

1.  Choose  the  statement  below  that  best  categorizes  "\* 
the  actions  described  in  questions  2  through  4.  '  ':•  , 

Select  (A)  if  the  action  takes  advantage  or  tries  v  ■ 

to  take  advantage  of  the  existing  global  economy.  V,  i- 

,  .'.i'l; 

Select  (B)  if  the  action  secures  or  tries  to 
secure  reforms,  or  minor  changes  within  the  existing       -.^^■"'^^ 
global  economy.  .. 

Select  (C)  if  the  action  revolutionizes  or  tries        .  i. 
to  drastically  alter  some  aspect  of  the  global  economy.        " 

Select  (D)  if  the  action  provides  or  attempts 
•CO  provide  insulation  or  isolation  from  the  global 
economy. 

2.  The  nationalization  of  oil  companies  by  Peru. 

3.  The  effort  of  the  Jamaican  government  to  control 
investments  by  multinational  aluminum  companies. 

4.  The  use  of  modern  communications  facilities  to 
manipulate  currency  exchange  rates.  ' 
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APPENDIX  B 
ITEM  ANALYSIS  OF  THE  FINAL  OBJECTIVE  TEST 
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QUESTIONNAIRE 

Directions:   Please  complete  the  questionnaire.   Check 
the  appropriate  response  and  then  fill  in  the  corresponding 
bubble  on  the  answer  sheet .   When  responding  to  the  items 
below,  consider  an  essay  exam  to  be  one  which  requires 
several  paragraphs  to  complete  each  question. 

1.  In  which  age  category  are  you? 

(a)  20  or  under    (b)  21-25    (c)  ^over  25 

2.  Have  you  taken  a  course  from  this  instructor  before? 
(a)  ^yes    (b)  no 

3.  Hov7  many  Political  Science  courses  have  you  completed? 
(a)  ^none    (b)  one    (c)  two 

(d)  three    (e)  four  or  more 

U.   Did  you  transfer  to  this  university  from: 

(a)  a  four  year  school    (b)  a  two  year  school 

(c)  did  not  transfer 

5.  Which  test  format  is  more  difficult  for  you? 

(a)  essay    (b)  objective    (c)  ^neither 

6.  In  your  previous  courses  did  you  v^7rite  essay  exams: 
(a)  usually    (b)  about  half  the  time 

(c)  seldom    (d)  ^never 

7.  Which  test  format  do  you  prefer? 

(a)  essay    (b)  ^objective    (c)  ^other 

(d)  it  makes  no  difference 

8.  What  is  your  major?   
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